首页 > 最新文献

Stat最新文献

英文 中文
The current landscape of academic statistical and data science collaboration units with examples 学术统计和数据科学合作单位的现状及实例
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-26 DOI: 10.1002/sta4.718
Julia Sharp, Emily H. Griffith, Bruce A. Craig, Alexandra Hanlon, Sarah Peskoe, Jennifer Van Mullekom
The delivery of academic statistical collaboration resources can vary among types of institutions and across time. In particular, this variation might occur in the management of infrastructure and the business model, the staffing model and opportunities for staff development. In this manuscript, we present examples of these three themes in modern academic statistical collaboration units and describe key advantages and challenges.
不同类型的机构和不同时期提供的学术统计协作资源可能会有所不同。特别是在基础设施管理和业务模式、人员配置模式以及员工发展机会方面,这种差异可能会出现。在本手稿中,我们将举例说明现代学术统计合作单位的这三个主题,并介绍其主要优势和挑战。
{"title":"The current landscape of academic statistical and data science collaboration units with examples","authors":"Julia Sharp, Emily H. Griffith, Bruce A. Craig, Alexandra Hanlon, Sarah Peskoe, Jennifer Van Mullekom","doi":"10.1002/sta4.718","DOIUrl":"https://doi.org/10.1002/sta4.718","url":null,"abstract":"The delivery of academic statistical collaboration resources can vary among types of institutions and across time. In particular, this variation might occur in the management of infrastructure and the business model, the staffing model and opportunities for staff development. In this manuscript, we present examples of these three themes in modern academic statistical collaboration units and describe key advantages and challenges.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141774141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New two‐sample test utilizing interpoint distance discrepancy 利用点间距离差异进行新的双样本检验
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-22 DOI: 10.1002/sta4.712
Dong Xu
In this paper, we propose a novel two‐sample test for multivariate sample space. The test statistic calculates the mean of absolute difference of average interpoint distance. We utilize a permutation procedure to establish the critical value for the test. Through comprehensive simulation studies, we compare the performance of our proposed test with that of the K‐nearest neighbour test and the energy test. The results demonstrate that our proposed test exhibits advantages over the other two tests, particularly in high‐dimensional sample spaces. This superiority is further validated by its application to UCR time series datasets.
本文提出了一种新颖的多元样本空间双样本检验方法。该检验统计量计算平均点间距离绝对差的平均值。我们利用置换程序来确定检验的临界值。通过综合模拟研究,我们比较了我们提出的检验与 K 最近邻检验和能量检验的性能。结果表明,我们提出的检验方法比其他两种检验方法更具优势,尤其是在高维样本空间中。在 UCR 时间序列数据集上的应用进一步验证了这一优势。
{"title":"New two‐sample test utilizing interpoint distance discrepancy","authors":"Dong Xu","doi":"10.1002/sta4.712","DOIUrl":"https://doi.org/10.1002/sta4.712","url":null,"abstract":"In this paper, we propose a novel two‐sample test for multivariate sample space. The test statistic calculates the mean of absolute difference of average interpoint distance. We utilize a permutation procedure to establish the critical value for the test. Through comprehensive simulation studies, we compare the performance of our proposed test with that of the K‐nearest neighbour test and the energy test. The results demonstrate that our proposed test exhibits advantages over the other two tests, particularly in high‐dimensional sample spaces. This superiority is further validated by its application to UCR time series datasets.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141774143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sign‐flip inference for spatial regression with differential regularisation 利用微分正则化进行空间回归的符号翻转推理
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-17 DOI: 10.1002/sta4.711
Michele Cavazzutti, Eleonora Arnone, Federico Ferraccioli, Cristina Galimberti, Livio Finos, Laura M. Sangalli
SummaryWe address the problem of performing inference on the linear and nonlinear terms of a semiparametric spatial regression model with differential regularisation. For the linear term, we propose a new resampling procedure, based on (partial) sign‐flipping of an appropriate transformation of the residuals of the model. The proposed resampling scheme can mitigate the bias effect induced by the differential regularisation. We prove that the proposed test is asymptotically exact. Moreover, we show, by simulation studies, that it enjoys very good control of Type‐I error also in small sample scenarios, differently from parametric alternatives. Additionally, we show that the proposed test has higher power with respect than recently proposed nonparametric tests on the linear term of semiparametric regression models with differential regularisation. Concerning the nonlinear term, we develop three different inference approaches: a parametric one and two nonparametric alternatives. The nonparametric tests are based on a sign‐flip approach. One of these is proved to be asymptotically exact, while the other is proved to be exact also for finite samples. Simulation studies highlight the good control of Type‐I error of the nonparametric approaches with respect the parametric test, while retaining high power.
摘要我们要解决的问题是对具有微分正则化的半参数空间回归模型的线性项和非线性项进行推断。对于线性项,我们提出了一种新的重采样程序,该程序基于模型残差适当变换的(部分)符号翻转。所提出的重采样方案可以减轻微分正则化引起的偏差效应。我们证明了所提出的检验方法是渐近精确的。此外,我们还通过模拟研究表明,与参数法不同,该方法在小样本情况下也能很好地控制 I 类误差。此外,我们还证明,与最近提出的对具有微分正则化的半参数回归模型线性项的非参数检验相比,所提出的检验具有更高的功率。关于非线性项,我们开发了三种不同的推断方法:一种参数方法和两种非参数方法。非参数检验基于符号翻转方法。其中一种被证明是渐近精确的,而另一种则被证明在有限样本中也是精确的。模拟研究突出表明,相对于参数检验,非参数方法能很好地控制第一类误差,同时保持较高的功率。
{"title":"Sign‐flip inference for spatial regression with differential regularisation","authors":"Michele Cavazzutti, Eleonora Arnone, Federico Ferraccioli, Cristina Galimberti, Livio Finos, Laura M. Sangalli","doi":"10.1002/sta4.711","DOIUrl":"https://doi.org/10.1002/sta4.711","url":null,"abstract":"SummaryWe address the problem of performing inference on the linear and nonlinear terms of a semiparametric spatial regression model with differential regularisation. For the linear term, we propose a new resampling procedure, based on (partial) sign‐flipping of an appropriate transformation of the residuals of the model. The proposed resampling scheme can mitigate the bias effect induced by the differential regularisation. We prove that the proposed test is asymptotically exact. Moreover, we show, by simulation studies, that it enjoys very good control of Type‐I error also in small sample scenarios, differently from parametric alternatives. Additionally, we show that the proposed test has higher power with respect than recently proposed nonparametric tests on the linear term of semiparametric regression models with differential regularisation. Concerning the nonlinear term, we develop three different inference approaches: a parametric one and two nonparametric alternatives. The nonparametric tests are based on a sign‐flip approach. One of these is proved to be asymptotically exact, while the other is proved to be exact also for finite samples. Simulation studies highlight the good control of Type‐I error of the nonparametric approaches with respect the parametric test, while retaining high power.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141739153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Markov‐switching spatio‐temporal ARCH model 马尔可夫转换时空 ARCH 模型
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-15 DOI: 10.1002/sta4.713
Tzung Hsuen Khoo, Dharini Pathmanathan, Philipp Otto, Sophie Dabo‐Niang
Stock market indices are volatile by nature, and sudden shocks are known to affect volatility patterns. The autoregressive conditional heteroskedasticity (ARCH) and generalized ARCH (GARCH) models neglect structural breaks triggered by sudden shocks that may lead to an overestimation of persistence, causing an upward bias in the estimates. Different regime‐switching models that have abrupt regime‐switching governed by a Markov chain were developed to model volatility in financial time series data. Volatility modelling was also extended to spatially interconnected time series, resulting in spatial variants of ARCH models. This inspired us to propose a Markov switching framework of the spatio‐temporal log‐ARCH model. In this article, we discuss the Markov‐switching extension of the model, the estimation procedure and the smooth inferences of the regimes. The Monte Carlo simulation studies show that the maximum likelihood estimation method for our proposed model has good finite sample properties. The proposed model was applied to 28 stock indices' data that were presumably affected by the 2015–2016 Chinese stock market crash. The results showed that our model is a better fit compared to that of the one‐regime counterpart. Furthermore, the smoothed inference of the data indicated the approximate periods where structural breaks occurred. This model can capture structural breaks that simultaneously occur in nearby locations.
股票市场指数本质上是波动的,众所周知,突发性冲击会影响波动模式。自回归条件异方差模型(ARCH)和广义自回归条件异方差模型(GARCH)忽视了突发性冲击引发的结构性断裂,这可能会导致对持续性的高估,造成估计值的向上偏差。为了对金融时间序列数据的波动性进行建模,我们开发了不同的制度转换模型,这些模型具有由马尔科夫链控制的突然制度转换。波动模型也被扩展到空间上相互关联的时间序列,从而产生了 ARCH 模型的空间变体。这启发我们提出了时空对数-ARCH 模型的马尔可夫转换框架。在本文中,我们讨论了该模型的马尔可夫切换扩展、估计过程和平稳推断制度。蒙特卡罗模拟研究表明,我们提出的模型的最大似然估计方法具有良好的有限样本特性。将提出的模型应用于推测受 2015-2016 年中国股灾影响的 28 个股指数据。结果表明,我们的模型比一制度对应模型的拟合效果更好。此外,对数据的平滑推断表明了结构性断裂发生的大致时期。该模型可以捕捉到在附近地点同时发生的结构性断裂。
{"title":"A Markov‐switching spatio‐temporal ARCH model","authors":"Tzung Hsuen Khoo, Dharini Pathmanathan, Philipp Otto, Sophie Dabo‐Niang","doi":"10.1002/sta4.713","DOIUrl":"https://doi.org/10.1002/sta4.713","url":null,"abstract":"Stock market indices are volatile by nature, and sudden shocks are known to affect volatility patterns. The autoregressive conditional heteroskedasticity (ARCH) and generalized ARCH (GARCH) models neglect structural breaks triggered by sudden shocks that may lead to an overestimation of persistence, causing an upward bias in the estimates. Different regime‐switching models that have abrupt regime‐switching governed by a Markov chain were developed to model volatility in financial time series data. Volatility modelling was also extended to spatially interconnected time series, resulting in spatial variants of ARCH models. This inspired us to propose a Markov switching framework of the spatio‐temporal log‐ARCH model. In this article, we discuss the Markov‐switching extension of the model, the estimation procedure and the smooth inferences of the regimes. The Monte Carlo simulation studies show that the maximum likelihood estimation method for our proposed model has good finite sample properties. The proposed model was applied to 28 stock indices' data that were presumably affected by the 2015–2016 Chinese stock market crash. The results showed that our model is a better fit compared to that of the one‐regime counterpart. Furthermore, the smoothed inference of the data indicated the approximate periods where structural breaks occurred. This model can capture structural breaks that simultaneously occur in nearby locations.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using sliced inverse mean difference for dimension reduction in multivariate time series 在多元时间序列中使用切片反平均差降维
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-13 DOI: 10.1002/sta4.709
Hector Haffenden, Andreas Artemiou
Following recent developments of dimension reduction algorithms for a multivariate time series, we propose in this work the adaptation of sliced inverse mean difference algorithm, an algorithm which was previously proposed in a standard multiple regression setting, to develop an algorithm appropriate to perform dimension reduction for a multivariate time series. The resulting algorithm called time series sliced inverse mean difference (TSIMD) is shown to be able to identify important directions and important lags using less significant pairs than previously proposed algorithms for dimension reduction in multivariate time series. We demonstrate the competitive performance of our algorithms through a number of experiments.
根据多变量时间序列降维算法的最新发展,我们在这项工作中提出了对切片反均值差算法(一种以前在标准多元回归设置中提出的算法)进行调整,以开发一种适合多变量时间序列降维的算法。结果表明,与以前提出的多变量时间序列降维算法相比,名为时间序列切片反均值差(TSIMD)的算法能够使用较少的重要对来识别重要方向和重要滞后。我们通过大量实验证明了我们的算法具有竞争力的性能。
{"title":"Using sliced inverse mean difference for dimension reduction in multivariate time series","authors":"Hector Haffenden, Andreas Artemiou","doi":"10.1002/sta4.709","DOIUrl":"https://doi.org/10.1002/sta4.709","url":null,"abstract":"Following recent developments of dimension reduction algorithms for a multivariate time series, we propose in this work the adaptation of sliced inverse mean difference algorithm, an algorithm which was previously proposed in a standard multiple regression setting, to develop an algorithm appropriate to perform dimension reduction for a multivariate time series. The resulting algorithm called time series sliced inverse mean difference (TSIMD) is shown to be able to identify important directions and important lags using less significant pairs than previously proposed algorithms for dimension reduction in multivariate time series. We demonstrate the competitive performance of our algorithms through a number of experiments.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141615021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What is it that you say you do here? Advocating for the critical role of data scientists in research infrastructure 你们说你们在这里做什么?宣传数据科学家在研究基础设施中的重要作用
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-11 DOI: 10.1002/sta4.714
Chasz Griego, Nicky Agate, Ana‐Maria Iosif, Amy M. Crisp
Clinical and academic research continues to become more complex as our knowledge and technology advance. A substantial and growing number of specialists in biostatistics, data science and library sciences are needed to support these research systems and promote high‐calibre research. However, that support is often marginalized as optional rather than a fundamental component of research infrastructure. By building research infrastructure, an institution harnesses access to tools and support/service centres that host skilled experts who approach research with best practices in mind and domain‐specific knowledge at hand. We outline the potential roles of data scientists and statisticians in research infrastructure and recommend guidelines for advocating for the institutional resources needed to support these roles in a sustainable and efficient manner for the long‐term success of the institution. We provide these guidelines in terms of resource efficiency, monetary efficiency and long‐term sustainability. We hope this work contributes to—and provides shared language for—a conversation on a broader framework beyond metrics that can be used to advocate for needed resources.
随着知识和技术的进步,临床和学术研究不断变得更加复杂。为了支持这些研究系统并促进高水平的研究,需要大量且越来越多的生物统计学、数据科学和图书馆学专家。然而,这种支持往往被边缘化,被认为是可有可无的,而不是研究基础设施的基本组成部分。通过建设研究基础设施,一个机构可以利用各种工具和支持/服务中心,这些中心拥有技术娴熟的专家,他们在进行研究时会考虑到最佳实践并掌握特定领域的知识。我们概述了数据科学家和统计学家在研究基础设施中可能发挥的作用,并提出了为机构的长期成功,以可持续和高效的方式支持这些作用所需的机构资源的指导方针。我们从资源效率、货币效率和长期可持续性的角度提供了这些指导原则。我们希望这项工作能为我们的对话做出贡献,并提供共同语言,从而建立一个超越衡量标准的更广泛的框架,用于争取所需的资源。
{"title":"What is it that you say you do here? Advocating for the critical role of data scientists in research infrastructure","authors":"Chasz Griego, Nicky Agate, Ana‐Maria Iosif, Amy M. Crisp","doi":"10.1002/sta4.714","DOIUrl":"https://doi.org/10.1002/sta4.714","url":null,"abstract":"Clinical and academic research continues to become more complex as our knowledge and technology advance. A substantial and growing number of specialists in biostatistics, data science and library sciences are needed to support these research systems and promote high‐calibre research. However, that support is often marginalized as optional rather than a fundamental component of research infrastructure. By building research infrastructure, an institution harnesses access to tools and support/service centres that host skilled experts who approach research with best practices in mind and domain‐specific knowledge at hand. We outline the potential roles of data scientists and statisticians in research infrastructure and recommend guidelines for advocating for the institutional resources needed to support these roles in a sustainable and efficient manner for the long‐term success of the institution. We provide these guidelines in terms of resource efficiency, monetary efficiency and long‐term sustainability. We hope this work contributes to—and provides shared language for—a conversation on a broader framework beyond metrics that can be used to advocate for needed resources.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A sharper bound of the Hotelling–Solomons inequality 霍特林-索洛蒙斯不等式的更尖锐约束
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-09 DOI: 10.1002/sta4.710
Yuzo Maruyama
The original Hotelling–Solomons inequality states that an upper bound of the absolute difference between the mean and median, standardised by the standard deviation, is 1. However, in this paper, we introduce a new bound that depends on the sample size, which is strictly smaller than 1.
最初的 Hotelling-Solomons 不等式指出,平均值与中位数之间的绝对差值(以标准差标准化)的上限为 1。然而,在本文中,我们引入了一个取决于样本量的新上限,它严格小于 1。
{"title":"A sharper bound of the Hotelling–Solomons inequality","authors":"Yuzo Maruyama","doi":"10.1002/sta4.710","DOIUrl":"https://doi.org/10.1002/sta4.710","url":null,"abstract":"The original Hotelling–Solomons inequality states that an upper bound of the absolute difference between the mean and median, standardised by the standard deviation, is 1. However, in this paper, we introduce a new bound that depends on the sample size, which is strictly smaller than 1.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141569331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tensor factor adjustment for image classification with pervasive noises 张量因子调整用于具有普遍噪声的图像分类
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-06-27 DOI: 10.1002/sta4.705
Xiaochuan Li, Bingnan Li, Wenzhan Song, Yuan Ke
This paper studies a tensor factor model that augments samples from multiple classes. The nuisance common patterns shared across classes are characterised by pervasive noises, and the patterns that distinguish different classes are represented by class‐specific components. Additionally, the pervasive component is modelled by the production of a low‐rank tensor latent factor and several factor loading matrices. This augmented tensor factor model can be expanded to a series of matrix variate tensor factor models and estimated using principal component analysis. The ranks of latent factors are estimated using a modified eigen‐ratio method. The proposed estimators have fast convergence rates and enjoy the blessing of dimensionality. The proposed factor model is applied to address the challenge of overlapping issues in image classification through a factor adjustment procedure. The procedure is shown to be powerful through synthetic experiments and an application to COVID‐19 pneumonia diagnosis from frontal chest X‐ray images.
本文研究的张量因子模型可增强来自多个类别的样本。不同类别之间共有的干扰共同模式由普遍噪声表征,而区分不同类别的模式则由特定类别成分表示。此外,通过生成低秩张量潜因子和多个因子载荷矩阵,对普遍成分进行建模。这种增强张量因子模型可扩展为一系列矩阵变量张量因子模型,并使用主成分分析法进行估算。潜在因子的阶数采用修正的特征比方法进行估算。所提出的估计方法收敛速度快,且不受维度限制。提出的因子模型通过一个因子调整程序用于解决图像分类中的重叠问题。通过合成实验和在 COVID-19 肺炎诊断中对前胸 X 光图像的应用,证明了该程序的强大功能。
{"title":"Tensor factor adjustment for image classification with pervasive noises","authors":"Xiaochuan Li, Bingnan Li, Wenzhan Song, Yuan Ke","doi":"10.1002/sta4.705","DOIUrl":"https://doi.org/10.1002/sta4.705","url":null,"abstract":"This paper studies a tensor factor model that augments samples from multiple classes. The nuisance common patterns shared across classes are characterised by pervasive noises, and the patterns that distinguish different classes are represented by class‐specific components. Additionally, the pervasive component is modelled by the production of a low‐rank tensor latent factor and several factor loading matrices. This augmented tensor factor model can be expanded to a series of matrix variate tensor factor models and estimated using principal component analysis. The ranks of latent factors are estimated using a modified eigen‐ratio method. The proposed estimators have fast convergence rates and enjoy the blessing of dimensionality. The proposed factor model is applied to address the challenge of overlapping issues in image classification through a factor adjustment procedure. The procedure is shown to be powerful through synthetic experiments and an application to COVID‐19 pneumonia diagnosis from frontal chest X‐ray images.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Solving the missing at random problem in semi‐supervised learning: An inverse probability weighting method 解决半监督学习中的随机缺失问题:反概率加权法
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-06-23 DOI: 10.1002/sta4.707
Jin Su, Shuyi Zhang, Yong Zhou
We propose an estimator for the population mean under the semi‐supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing , denoted by , depends on the total sample size and satisfies . To efficiently estimate , we introduce an adaptive estimator based on inverse probability weighting and cross‐fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of , slower than the typical rate, due to the diminishing proportion of labelled data as the sample size increases in the semi‐supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.
我们提出了一种在随机缺失(MAR)假设的半监督学习环境下的总体均值估计方法。在这种情况下,我们假设观测到的概率为 ,表示为 ,取决于样本总量,并满足 。为了有效估计 ,我们引入了一种基于反概率加权和交叉拟合的自适应估计器。理论分析表明,我们提出的估计器具有一致性和高效性,收敛速度为 ,低于典型的收敛速度,这是由于在半监督设置中,随着样本量的增加,标记数据的比例会逐渐减少。我们还证明了反概率加权(IPW)-Nadaraya-Watson 密度函数估计器的一致性。大量的模拟和对洛杉矶无家可归者数据的应用验证了我们方法的有效性。
{"title":"Solving the missing at random problem in semi‐supervised learning: An inverse probability weighting method","authors":"Jin Su, Shuyi Zhang, Yong Zhou","doi":"10.1002/sta4.707","DOIUrl":"https://doi.org/10.1002/sta4.707","url":null,"abstract":"We propose an estimator for the population mean under the semi‐supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing , denoted by , depends on the total sample size and satisfies . To efficiently estimate , we introduce an adaptive estimator based on inverse probability weighting and cross‐fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of , slower than the typical rate, due to the diminishing proportion of labelled data as the sample size increases in the semi‐supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph spatial sampling 图形空间采样
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-06-23 DOI: 10.1002/sta4.708
Li‐Chun Zhang
We develop lagged Metropolis–Hastings walk for sampling from simple undirected graphs according to given stationary sampling probabilities. It is explained how the technique can be applied together with designed graphs for sampling of units‐in‐space. Compared with the existing spatial sampling methods, which chiefly focus on the sample spatial balance regardless of the associated outcomes of interest, the proposed graph spatial sampling method can considerably improve the efficiency because the graph can be designed to take into account the anticipated spatial distribution of the outcome of interest.
我们开发了滞后 Metropolis-Hastings 走法,用于根据给定的静态采样概率从简单无向图中进行采样。我们还解释了如何将该技术与设计好的图一起用于空间单位的抽样。现有的空间抽样方法主要关注样本的空间平衡,而不考虑相关的结果,与之相比,所提出的图空间抽样方法可以大大提高效率,因为图的设计可以考虑到相关结果的预期空间分布。
{"title":"Graph spatial sampling","authors":"Li‐Chun Zhang","doi":"10.1002/sta4.708","DOIUrl":"https://doi.org/10.1002/sta4.708","url":null,"abstract":"We develop lagged Metropolis–Hastings walk for sampling from simple undirected graphs according to given stationary sampling probabilities. It is explained how the technique can be applied together with designed graphs for sampling of units‐in‐space. Compared with the existing spatial sampling methods, which chiefly focus on the sample spatial balance regardless of the associated outcomes of interest, the proposed graph spatial sampling method can considerably improve the efficiency because the graph can be designed to take into account the anticipated spatial distribution of the outcome of interest.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Stat
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1