首页 > 最新文献

Stat最新文献

英文 中文
What matters to graduate students? Experiences at a statistical consulting center from pre‐ to post‐COVID‐19 pandemic 什么对研究生很重要?从 COVID-19 流行前到流行后在统计咨询中心的经历
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-03-04 DOI: 10.1002/sta4.659
Marianne Huebner, Steven J. Pierce, Andrew J. Dennhardt, Hope Akaeze, Nicole Jess, Wenjuan Ma
The COVID‐19 pandemic led to unprecedented changes in all levels of society, including the statistical consulting field. This paper focuses on the experiences of graduate student consultants and clients at our statistical consulting center (SCC) that operates all year independent of semesters. During the lockdown period, work continued without interruption and was conducted remotely, but there was a temporary reduction in utilization. Advice on statistical methods, help with data analysis and educational offerings are the main appeals to utilize SCC services. We describe our mentoring approach for graduate student research assistants (RAs) and how pandemic changes affected RAs and clients. Based on experiences during the pandemic, we offer practical suggestions for SCCs' approaches to research support, work characteristics and collaborations to improve the experiences of graduate students, both as consultants and clients. Most collaboration meetings are now virtual by request from clients. Telecommuting supports flexible personal schedules and needs. Online educational offerings provide easier access for participants and more opportunities for a wider range of topics and presenters. However, mentoring sessions for RAs are best conducted in‐person, and every effort should be made to encourage in‐person interactions and collaborations between staff members to advance the effectiveness of post‐pandemic SCCs.
COVID-19 大流行导致社会各个层面发生了前所未有的变化,包括统计咨询领域。本文重点介绍了我们统计咨询中心(SCC)的研究生咨询师和客户的经历,该中心全年运作,不受学期限制。在封锁期间,中心的工作没有中断,而且是远程进行的,但使用率暂时有所下降。有关统计方法的建议、数据分析帮助和教育课程是利用 SCC 服务的主要原因。我们介绍了针对研究生研究助理 (RA) 的指导方法,以及大流行病的变化对研究助理和客户的影响。根据大流行病期间的经验,我们对 SCC 在研究支持、工作特点和合作方面的方法提出了切实可行的建议,以改善研究生作为顾问和客户的体验。现在,大多数合作会议都是应客户的要求举行的虚拟会议。远程办公支持灵活的个人时间安排和需求。在线教育为参与者提供了更便捷的途径,也为更广泛的主题和主讲人提供了更多机会。然而,针对 RA 的指导课程最好是面对面进行,应尽一切努力鼓励工作人员之间的面对面互动与合作,以提高流行病后 SCC 的有效性。
{"title":"What matters to graduate students? Experiences at a statistical consulting center from pre‐ to post‐COVID‐19 pandemic","authors":"Marianne Huebner, Steven J. Pierce, Andrew J. Dennhardt, Hope Akaeze, Nicole Jess, Wenjuan Ma","doi":"10.1002/sta4.659","DOIUrl":"https://doi.org/10.1002/sta4.659","url":null,"abstract":"The COVID‐19 pandemic led to unprecedented changes in all levels of society, including the statistical consulting field. This paper focuses on the experiences of graduate student consultants and clients at our statistical consulting center (SCC) that operates all year independent of semesters. During the lockdown period, work continued without interruption and was conducted remotely, but there was a temporary reduction in utilization. Advice on statistical methods, help with data analysis and educational offerings are the main appeals to utilize SCC services. We describe our mentoring approach for graduate student research assistants (RAs) and how pandemic changes affected RAs and clients. Based on experiences during the pandemic, we offer practical suggestions for SCCs' approaches to research support, work characteristics and collaborations to improve the experiences of graduate students, both as consultants and clients. Most collaboration meetings are now virtual by request from clients. Telecommuting supports flexible personal schedules and needs. Online educational offerings provide easier access for participants and more opportunities for a wider range of topics and presenters. However, mentoring sessions for RAs are best conducted in‐person, and every effort should be made to encourage in‐person interactions and collaborations between staff members to advance the effectiveness of post‐pandemic SCCs.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140036512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Highly private large‐sample tests for contingency tables 对或然率表进行高度私有化的大样本测试
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-02-29 DOI: 10.1002/sta4.658
Sungkyu Jung, Seung Woo Kwak
Differential privacy is a foundational concept for safeguarding sensitive individual information when releasing data or statistical analysis results. In this study, we concentrate on the protection of privacy in the context of goodness‐of‐fit (GOF) and independence tests, utilizing perturbed contingency tables that adhere to Gaussian differential privacy within the high‐privacy regime, where the degrees of privacy protection increase as the sample size increases. We introduce private test procedures for GOF, independence of two variables and the equality of proportions in paired samples, similar to McNemar's test. For each of these hypothesis testing situations, we propose private test statistics based on the statistics and establish their asymptotic null distributions. We numerically confirm that Type I error rates of the proposed private test procedures are well controlled and have adequate power for larger sample sizes and effect sizes. The proposal is demonstrated in private inferences based on the American Time Use Survey data.
差分隐私是在发布数据或统计分析结果时保护敏感个人信息的基本概念。在本研究中,我们将重点放在拟合优度(GOF)和独立性检验中的隐私保护上,利用扰动的或然率表,在高隐私机制下坚持高斯差分隐私,即隐私保护程度随着样本量的增加而增加。我们为 GOF、两个变量的独立性和配对样本中的比例相等(类似于 McNemar 检验)引入了隐私检验程序。对于上述每种假设检验情况,我们都提出了基于统计量的私有检验统计量,并建立了它们的渐近零分布。我们用数字证实了所提出的私人检验程序的 I 类错误率得到了很好的控制,并且对于较大的样本量和效应量具有足够的功率。我们在基于美国时间使用调查数据的私人推断中演示了这一建议。
{"title":"Highly private large‐sample tests for contingency tables","authors":"Sungkyu Jung, Seung Woo Kwak","doi":"10.1002/sta4.658","DOIUrl":"https://doi.org/10.1002/sta4.658","url":null,"abstract":"Differential privacy is a foundational concept for safeguarding sensitive individual information when releasing data or statistical analysis results. In this study, we concentrate on the protection of privacy in the context of goodness‐of‐fit (GOF) and independence tests, utilizing perturbed contingency tables that adhere to Gaussian differential privacy within the high‐privacy regime, where the degrees of privacy protection increase as the sample size increases. We introduce private test procedures for GOF, independence of two variables and the equality of proportions in paired samples, similar to McNemar's test. For each of these hypothesis testing situations, we propose private test statistics based on the statistics and establish their asymptotic null distributions. We numerically confirm that Type I error rates of the proposed private test procedures are well controlled and have adequate power for larger sample sizes and effect sizes. The proposal is demonstrated in private inferences based on the American Time Use Survey data.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine collaboration 机器协作
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-02-29 DOI: 10.1002/sta4.661
Qingfeng Liu, Yang Feng
We propose a new ensemble framework for supervised learning, called machine collaboration (MaC), using a collection of possibly heterogeneous base learning methods (hereafter, base machines) for prediction tasks. Unlike bagging/stacking (a parallel and independent framework) and boosting (a sequential and top-down framework), MaC is a type of circular and recursive learning framework. The circular and recursive nature helps the base machines to transfer information circularly and update their structures and parameters accordingly. The theoretical result on the risk bound of the estimator from MaC reveals that the circular and recursive feature can help MaC reduce risk via a parsimonious ensemble. We conduct extensive experiments on MaC using both simulated data and 119 benchmark real datasets. The results demonstrate that in most cases, MaC performs significantly better than several other state-of-the-art methods, including classification and regression trees, neural networks, stacking, and boosting.
我们提出了一种新的监督学习集合框架,称为机器协作(Machine Collaboration,简称 MaC),它使用一系列可能异构的基础学习方法(以下简称基础机器)来完成预测任务。与bagging/stacking(并行和独立框架)和boosting(顺序和自上而下框架)不同,MaC是一种循环和递归学习框架。循环和递归的特性有助于基础机器循环传递信息,并相应地更新其结构和参数。关于MaC估计器风险边界的理论结果表明,循环和递归特性可以帮助MaC通过准集合降低风险。我们使用模拟数据和 119 个基准真实数据集对 MaC 进行了大量实验。结果表明,在大多数情况下,MaC 的性能明显优于其他几种最先进的方法,包括分类和回归树、神经网络、堆叠和提升。
{"title":"Machine collaboration","authors":"Qingfeng Liu, Yang Feng","doi":"10.1002/sta4.661","DOIUrl":"https://doi.org/10.1002/sta4.661","url":null,"abstract":"We propose a new ensemble framework for supervised learning, called <i>machine collaboration</i> (MaC), using a collection of possibly heterogeneous base learning methods (hereafter, base machines) for prediction tasks. Unlike bagging/stacking (a parallel and independent framework) and boosting (a sequential and top-down framework), MaC is a type of <i>circular</i> and <i>recursive</i> learning framework. The <i>circular</i> and <i>recursive</i> nature helps the base machines to transfer information circularly and update their structures and parameters accordingly. The theoretical result on the risk bound of the estimator from MaC reveals that the <i>circular</i> and <i>recursive</i> feature can help MaC reduce risk via a parsimonious ensemble. We conduct extensive experiments on MaC using both simulated data and 119 benchmark real datasets. The results demonstrate that in most cases, MaC performs significantly better than several other state-of-the-art methods, including classification and regression trees, neural networks, stacking, and boosting.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140008952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linear mixed models for complex survey data: Implementing and evaluating pairwise likelihood 复杂调查数据的线性混合模型:实施和评估成对可能性
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-02-27 DOI: 10.1002/sta4.657
Thomas Lumley, Xudong Huang
As complex-survey data become more widely used in health and social science research, there is increasing interest in fitting a wider range of regression models. We describe an implementation of two-level linear mixed models in R using the pairwise composite likelihood approach of Rao and co-workers. We discuss the computational efficiency of pairwise composite likelihood and compare the estimator to the existing sequential pseudolikelihood estimator in simulations and in data from the Programme for International Student Assessment (PISA) educational survey.
随着复杂的调查数据越来越广泛地应用于健康和社会科学研究,人们对拟合更广泛的回归模型越来越感兴趣。我们介绍了使用 Rao 及其合作者的成对复合似然法在 R 中实现两级线性混合模型的方法。我们讨论了成对复合似然的计算效率,并在模拟和国际学生评估项目(PISA)教育调查数据中将该估计器与现有的顺序伪似然估计器进行了比较。
{"title":"Linear mixed models for complex survey data: Implementing and evaluating pairwise likelihood","authors":"Thomas Lumley, Xudong Huang","doi":"10.1002/sta4.657","DOIUrl":"https://doi.org/10.1002/sta4.657","url":null,"abstract":"As complex-survey data become more widely used in health and social science research, there is increasing interest in fitting a wider range of regression models. We describe an implementation of two-level linear mixed models in R using the pairwise composite likelihood approach of Rao and co-workers. We discuss the computational efficiency of pairwise composite likelihood and compare the estimator to the existing sequential pseudolikelihood estimator in simulations and in data from the Programme for International Student Assessment (PISA) educational survey.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139981455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A note about why deep learning is deep: A discontinuous approximation perspective 深度学习为什么是深度学习?非连续逼近的视角
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-02-22 DOI: 10.1002/sta4.654
Yongxin Li, Haobo Qi, Hansheng Wang
Deep learning has achieved unprecedented success in recent years. This approach essentially uses the composition of nonlinear functions to model the complex relationship between input features and output labels. However, a comprehensive theoretical understanding of why the hierarchical layered structure can exhibit superior expressive power is still lacking. In this paper, we provide an explanation for this phenomenon by measuring the approximation efficiency of neural networks with respect to discontinuous target functions. We focus on deep neural networks with rectified linear unit (ReLU) activation functions. We find that to achieve the same degree of approximation accuracy, the number of neurons required by a single‐hidden‐layer (SHL) network is exponentially greater than that required by a multi‐hidden‐layer (MHL) network. In practice, discontinuous points tend to contain highly valuable information (i.e., edges in image classification). We argue that this may be a very important reason accounting for the impressive performance of deep neural networks. We validate our theory in extensive experiments.
近年来,深度学习取得了前所未有的成功。这种方法本质上是利用非线性函数的组成来模拟输入特征和输出标签之间的复杂关系。然而,对于分层分层结构为何能表现出卓越的表现力,目前还缺乏全面的理论认识。在本文中,我们通过测量神经网络对不连续目标函数的逼近效率来解释这一现象。我们重点研究了具有整流线性单元(ReLU)激活函数的深度神经网络。我们发现,要达到相同程度的逼近精度,单隐藏层(SHL)网络所需的神经元数量呈指数级增长,而多隐藏层(MHL)网络所需的神经元数量则呈指数级增长。实际上,不连续的点往往包含非常有价值的信息(即图像分类中的边缘)。我们认为,这可能是深度神经网络取得惊人性能的一个非常重要的原因。我们在大量实验中验证了我们的理论。
{"title":"A note about why deep learning is deep: A discontinuous approximation perspective","authors":"Yongxin Li, Haobo Qi, Hansheng Wang","doi":"10.1002/sta4.654","DOIUrl":"https://doi.org/10.1002/sta4.654","url":null,"abstract":"Deep learning has achieved unprecedented success in recent years. This approach essentially uses the composition of nonlinear functions to model the complex relationship between input features and output labels. However, a comprehensive theoretical understanding of why the hierarchical layered structure can exhibit superior expressive power is still lacking. In this paper, we provide an explanation for this phenomenon by measuring the approximation efficiency of neural networks with respect to discontinuous target functions. We focus on deep neural networks with rectified linear unit (ReLU) activation functions. We find that to achieve the same degree of approximation accuracy, the number of neurons required by a single‐hidden‐layer (SHL) network is exponentially greater than that required by a multi‐hidden‐layer (MHL) network. In practice, discontinuous points tend to contain highly valuable information (i.e., edges in image classification). We argue that this may be a very important reason accounting for the impressive performance of deep neural networks. We validate our theory in extensive experiments.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139948722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reproducible research practices: A tool for effective and efficient leadership in collaborative statistics 可复制的研究实践:切实有效领导合作统计工作的工具
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-02-11 DOI: 10.1002/sta4.653
Camille J. Hochheimer, Grace N. Bosma, Lauren Gunn-Sandell, Mary D. Sammel
With data and code sharing policies more common and version control more widely used in statistics, standards for reproducible research are higher than ever. Reproducible research practices must keep up with the fast pace of research. To do so, we propose combining modern practices of leadership with best practices for reproducible research in collaborative statistics as an effective tool for ensuring quality and accuracy while developing stewardship and autonomy in the people we lead. First, we establish a framework for expectations of reproducible statistical research. Then, we introduce Stephen M.R. Covey's theory of trusting and inspiring leadership. These two are combined as we show how stewardship agreements can be used to make reproducible coding a team norm. We provide an illustrative code example and highlight how this method creates a more collaborative rather than evaluative culture where team members hold themselves accountable. The goal of this manuscript is for statisticians to find this application of leadership theory useful and to inspire them to intentionally develop their personal approach to leadership.
随着数据和代码共享政策越来越普遍,版本控制在统计领域的应用也越来越广泛,可重复研究的标准比以往任何时候都要高。可重复研究实践必须跟上快速的研究步伐。为此,我们建议将现代领导力实践与合作统计中的可重现研究最佳实践相结合,作为确保质量和准确性的有效工具,同时培养我们所领导的人员的管理能力和自主性。首先,我们建立了一个对可重复统计研究的期望框架。然后,我们介绍斯蒂芬-柯维(Stephen M.R. Covey)的信任和激励型领导理论。我们将这两者结合起来,展示如何利用管理协议使可重复编码成为团队规范。我们提供了一个代码示例,并强调了这种方法如何创造出一种更具协作性而非评价性的文化,让团队成员对自己负责。本手稿的目的是让统计人员发现领导力理论的应用非常有用,并激励他们有意识地发展个人的领导力方法。
{"title":"Reproducible research practices: A tool for effective and efficient leadership in collaborative statistics","authors":"Camille J. Hochheimer, Grace N. Bosma, Lauren Gunn-Sandell, Mary D. Sammel","doi":"10.1002/sta4.653","DOIUrl":"https://doi.org/10.1002/sta4.653","url":null,"abstract":"With data and code sharing policies more common and version control more widely used in statistics, standards for reproducible research are higher than ever. Reproducible research practices must keep up with the fast pace of research. To do so, we propose combining modern practices of leadership with best practices for reproducible research in collaborative statistics as an effective tool for ensuring quality and accuracy while developing stewardship and autonomy in the people we lead. First, we establish a framework for expectations of reproducible statistical research. Then, we introduce Stephen M.R. Covey's theory of trusting and inspiring leadership. These two are combined as we show how stewardship agreements can be used to make reproducible coding a team norm. We provide an illustrative code example and highlight how this method creates a more collaborative rather than evaluative culture where team members hold themselves accountable. The goal of this manuscript is for statisticians to find this application of leadership theory useful and to inspire them to intentionally develop their personal approach to leadership.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139756629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An EWMA sign chart for monitoring processes with fixed and variable sample sizes 用于监测具有固定和可变样本量的过程的 EWMA 符号图
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-02-09 DOI: 10.1002/sta4.652
Abdul Haq
This study addresses limitations in the nonparametric EWMA sign chart with fixed control limits (FCLs), particularly when facing time-varying sample sizes. The FCLs-based EWMA sign chart has a variable conditional false alarm rate (CFAR), especially at the startup of a process or after recovering from an out-of-control signal. To overcome these limitations, we propose a nonparametric EWMA sign chart based on dynamic probability control limits. This chart is capable of monitoring the process target with fixed, as well as time-varying sample sizes. Monte Carlo simulations are used to estimate the CFARs, zero-state (ZS) and steady-state (SS) average run-length profiles of the EWMA sign charts. It turns out that the proposed chart outperforms the existing chart, particularly in detecting shifts during the process startup, while maintaining the desired CFAR levels in both ZS and SS scenarios. A real data example is given to demonstrate the implementation of the EWMA sign charts.
本研究解决了具有固定控制限(FCLs)的非参数 EWMA 符号图的局限性,尤其是在面对随时间变化的样本量时。基于 FCLs 的 EWMA 信号图具有可变的条件误报率 (CFAR),尤其是在流程启动时或从失控信号中恢复后。为了克服这些局限性,我们提出了一种基于动态概率控制限的非参数 EWMA 信号图。该图表既能监测固定样本量的过程目标,也能监测随时间变化的样本量。蒙特卡罗模拟用于估算 EWMA 符号图的 CFAR、零态 (ZS) 和稳态 (SS) 平均运行长度曲线。结果表明,建议的图表优于现有图表,尤其是在检测流程启动过程中的偏移方面,同时在 ZS 和 SS 情景下都能保持理想的 CFAR 水平。我们给出了一个真实数据示例,以演示 EWMA 符号图的实施。
{"title":"An EWMA sign chart for monitoring processes with fixed and variable sample sizes","authors":"Abdul Haq","doi":"10.1002/sta4.652","DOIUrl":"https://doi.org/10.1002/sta4.652","url":null,"abstract":"This study addresses limitations in the nonparametric EWMA sign chart with fixed control limits (FCLs), particularly when facing time-varying sample sizes. The FCLs-based EWMA sign chart has a variable conditional false alarm rate (CFAR), especially at the startup of a process or after recovering from an out-of-control signal. To overcome these limitations, we propose a nonparametric EWMA sign chart based on dynamic probability control limits. This chart is capable of monitoring the process target with fixed, as well as time-varying sample sizes. Monte Carlo simulations are used to estimate the CFARs, zero-state (ZS) and steady-state (SS) average run-length profiles of the EWMA sign charts. It turns out that the proposed chart outperforms the existing chart, particularly in detecting shifts during the process startup, while maintaining the desired CFAR levels in both ZS and SS scenarios. A real data example is given to demonstrate the implementation of the EWMA sign charts.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139756630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning models to predict primary open-angle glaucoma 预测原发性开角型青光眼的深度学习模型
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-02-07 DOI: 10.1002/sta4.649
Ruiwen Zhou, J. Philip Miller, Mae Gordon, Michael Kass, Mingquan Lin, Yifan Peng, Fuhai Li, Jiarui Feng, Lei Liu
Glaucoma is a major cause of blindness and vision impairment worldwide, and visual field (VF) tests are essential for monitoring the conversion of glaucoma. While previous studies have primarily focused on using VF data at a single time point for glaucoma prediction, there has been limited exploration of longitudinal trajectories. Additionally, many deep learning techniques treat the time-to-glaucoma prediction as a binary classification problem (glaucoma Yes/No), resulting in the misclassification of some censored subjects into the nonglaucoma category and decreased power. To tackle these challenges, we propose and implement several deep-learning approaches that naturally incorporate temporal and spatial information from longitudinal VF data to predict time-to-glaucoma. When evaluated on the Ocular Hypertension Treatment Study (OHTS) dataset, our proposed convolutional neural network (CNN)-long short-term memory (LSTM) emerged as the top-performing model among all those examined. The implementation code can be found online (https://github.com/rivenzhou/VF_prediction).
青光眼是全球失明和视力受损的主要原因,而视野(VF)测试对于监测青光眼的转归至关重要。以往的研究主要侧重于使用单个时间点的视野数据进行青光眼预测,而对纵向轨迹的探索还很有限。此外,许多深度学习技术将时间到青光眼的预测视为二元分类问题(青光眼是/否),导致将一些删减的受试者误分类为非青光眼类别,降低了预测效果。为了应对这些挑战,我们提出并实施了几种深度学习方法,这些方法自然地结合了纵向 VF 数据中的时间和空间信息来预测青光眼的发生时间。在眼压治疗研究(OHTS)数据集上进行评估时,我们提出的卷积神经网络(CNN)-长短期记忆(LSTM)成为所有受检模型中表现最佳的模型。实现代码可在线查阅(https://github.com/rivenzhou/VF_prediction)。
{"title":"Deep learning models to predict primary open-angle glaucoma","authors":"Ruiwen Zhou, J. Philip Miller, Mae Gordon, Michael Kass, Mingquan Lin, Yifan Peng, Fuhai Li, Jiarui Feng, Lei Liu","doi":"10.1002/sta4.649","DOIUrl":"https://doi.org/10.1002/sta4.649","url":null,"abstract":"Glaucoma is a major cause of blindness and vision impairment worldwide, and visual field (VF) tests are essential for monitoring the conversion of glaucoma. While previous studies have primarily focused on using VF data at a single time point for glaucoma prediction, there has been limited exploration of longitudinal trajectories. Additionally, many deep learning techniques treat the time-to-glaucoma prediction as a binary classification problem (glaucoma Yes/No), resulting in the misclassification of some censored subjects into the nonglaucoma category and decreased power. To tackle these challenges, we propose and implement several deep-learning approaches that naturally incorporate temporal and spatial information from longitudinal VF data to predict time-to-glaucoma. When evaluated on the Ocular Hypertension Treatment Study (OHTS) dataset, our proposed convolutional neural network (CNN)-long short-term memory (LSTM) emerged as the top-performing model among all those examined. The implementation code can be found online (https://github.com/rivenzhou/VF_prediction).","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139756536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of the density for censored and contaminated data 普查数据和污染数据的密度估算
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-02-07 DOI: 10.1002/sta4.651
Ingrid Van Keilegom, Elif Kekeç
Consider a situation where one is interested in estimating the density of a survival time that is subject to random right censoring and measurement errors. This happens often in practice, like in public health (pregnancy length), medicine (duration of infection), ecology (duration of forest fire), among others. We assume a classical additive measurement error model with Gaussian noise and unknown error variance and a random right censoring scheme. Under this setup, we develop minimal conditions under which the assumed model is identifiable when no auxiliary variables or validation data are available, and we offer a flexible estimation strategy using Laguerre polynomials for the estimation of the error variance and the density of the survival time. The asymptotic normality of the proposed estimators is established, and the numerical performance of the methodology is investigated on both simulated and real data on gestational age.
考虑这样一种情况,即我们有兴趣估计受随机右删减和测量误差影响的生存时间的密度。这种情况在实践中经常发生,如公共卫生(怀孕时间)、医学(感染持续时间)、生态学(森林火灾持续时间)等。我们假设一个经典的加性测量误差模型,具有高斯噪声、未知误差方差和随机右删减方案。在这种设置下,我们提出了在没有辅助变量或验证数据的情况下可识别假定模型的最低条件,并提供了使用拉盖尔多项式估算误差方差和生存时间密度的灵活估算策略。我们还提出了一种灵活的估算策略,利用拉格多项式来估算误差方差和生存时间密度。我们建立了所提出的估算器的渐近正态性,并在模拟和真实孕龄数据上研究了该方法的数值性能。
{"title":"Estimation of the density for censored and contaminated data","authors":"Ingrid Van Keilegom, Elif Kekeç","doi":"10.1002/sta4.651","DOIUrl":"https://doi.org/10.1002/sta4.651","url":null,"abstract":"Consider a situation where one is interested in estimating the density of a survival time that is subject to random right censoring and measurement errors. This happens often in practice, like in public health (pregnancy length), medicine (duration of infection), ecology (duration of forest fire), among others. We assume a classical additive measurement error model with Gaussian noise and unknown error variance and a random right censoring scheme. Under this setup, we develop minimal conditions under which the assumed model is identifiable when no auxiliary variables or validation data are available, and we offer a flexible estimation strategy using Laguerre polynomials for the estimation of the error variance and the density of the survival time. The asymptotic normality of the proposed estimators is established, and the numerical performance of the methodology is investigated on both simulated and real data on gestational age.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139756535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A confidence machine for sparse high-order interaction model 稀疏高阶交互模型置信机
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-02-05 DOI: 10.1002/sta4.633
Diptesh Das, Eugene Ndiaye, Ichiro Takeuchi
In predictive modelling for high-stake decision-making, predictors must be not only accurate but also reliable. Conformal prediction (CP) is a promising approach for obtaining the coverage of prediction results with fewer theoretical assumptions. To obtain the prediction set by so-called full-CP, we need to refit the predictor for all possible values of prediction results, which is only possible for simple predictors. For complex predictors such as random forests (RFs) or neural networks (NNs), split-CP is often employed where the data is split into two parts: one part for fitting and another for computing the prediction set. Unfortunately, because of the reduced sample size, split-CP is inferior to full-CP both in fitting as well as prediction set computation. In this paper, we develop a full-CP of sparse high-order interaction model (SHIM), which is sufficiently flexible as it can take into account high-order interactions among variables. We resolve the computational challenge for full-CP of SHIM by introducing a novel approach called homotopy mining. Through numerical experiments, we demonstrate that SHIM is as accurate as complex predictors such as RF and NN and enjoys the superior statistical power of full-CP.
在用于高风险决策的预测建模中,预测器不仅要准确,而且要可靠。共形预测(CP)是一种以较少理论假设获得预测结果覆盖面的有前途的方法。要通过所谓的全共形预测(full-CP)获得预测集,我们需要针对预测结果的所有可能值重新拟合预测器,而这只适用于简单的预测器。对于随机森林(RF)或神经网络(NN)等复杂预测器,通常采用拆分式 CP,即将数据拆分为两部分:一部分用于拟合,另一部分用于计算预测集。遗憾的是,由于样本量减少,拆分式 CP 在拟合和预测集计算方面都不如完全式 CP。在本文中,我们开发了一种稀疏高阶交互模型(SHIM)的全CP,它具有足够的灵活性,可以考虑变量间的高阶交互作用。我们通过引入一种名为同调挖掘(homotopy mining)的新方法,解决了 SHIM 全 CP 的计算难题。通过数值实验,我们证明了 SHIM 与 RF 和 NN 等复杂预测器一样准确,并具有全 CP 的卓越统计能力。
{"title":"A confidence machine for sparse high-order interaction model","authors":"Diptesh Das, Eugene Ndiaye, Ichiro Takeuchi","doi":"10.1002/sta4.633","DOIUrl":"https://doi.org/10.1002/sta4.633","url":null,"abstract":"In predictive modelling for high-stake decision-making, predictors must be not only accurate but also reliable. Conformal prediction (CP) is a promising approach for obtaining the coverage of prediction results with fewer theoretical assumptions. To obtain the prediction set by so-called full-CP, we need to refit the predictor for all possible values of prediction results, which is only possible for simple predictors. For complex predictors such as random forests (RFs) or neural networks (NNs), split-CP is often employed where the data is split into two parts: one part for fitting and another for computing the prediction set. Unfortunately, because of the reduced sample size, split-CP is inferior to full-CP both in fitting as well as prediction set computation. In this paper, we develop a full-CP of sparse high-order interaction model (SHIM), which is sufficiently flexible as it can take into account high-order interactions among variables. We resolve the computational challenge for full-CP of SHIM by introducing a novel approach called homotopy mining. Through numerical experiments, we demonstrate that SHIM is as accurate as complex predictors such as RF and NN and enjoys the superior statistical power of full-CP.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139756534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Stat
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1