首页 > 最新文献

Stat最新文献

英文 中文
Deep learning models to predict primary open-angle glaucoma 预测原发性开角型青光眼的深度学习模型
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-07 DOI: 10.1002/sta4.649
Ruiwen Zhou, J. Philip Miller, Mae Gordon, Michael Kass, Mingquan Lin, Yifan Peng, Fuhai Li, Jiarui Feng, Lei Liu
Glaucoma is a major cause of blindness and vision impairment worldwide, and visual field (VF) tests are essential for monitoring the conversion of glaucoma. While previous studies have primarily focused on using VF data at a single time point for glaucoma prediction, there has been limited exploration of longitudinal trajectories. Additionally, many deep learning techniques treat the time-to-glaucoma prediction as a binary classification problem (glaucoma Yes/No), resulting in the misclassification of some censored subjects into the nonglaucoma category and decreased power. To tackle these challenges, we propose and implement several deep-learning approaches that naturally incorporate temporal and spatial information from longitudinal VF data to predict time-to-glaucoma. When evaluated on the Ocular Hypertension Treatment Study (OHTS) dataset, our proposed convolutional neural network (CNN)-long short-term memory (LSTM) emerged as the top-performing model among all those examined. The implementation code can be found online (https://github.com/rivenzhou/VF_prediction).
青光眼是全球失明和视力受损的主要原因,而视野(VF)测试对于监测青光眼的转归至关重要。以往的研究主要侧重于使用单个时间点的视野数据进行青光眼预测,而对纵向轨迹的探索还很有限。此外,许多深度学习技术将时间到青光眼的预测视为二元分类问题(青光眼是/否),导致将一些删减的受试者误分类为非青光眼类别,降低了预测效果。为了应对这些挑战,我们提出并实施了几种深度学习方法,这些方法自然地结合了纵向 VF 数据中的时间和空间信息来预测青光眼的发生时间。在眼压治疗研究(OHTS)数据集上进行评估时,我们提出的卷积神经网络(CNN)-长短期记忆(LSTM)成为所有受检模型中表现最佳的模型。实现代码可在线查阅(https://github.com/rivenzhou/VF_prediction)。
{"title":"Deep learning models to predict primary open-angle glaucoma","authors":"Ruiwen Zhou, J. Philip Miller, Mae Gordon, Michael Kass, Mingquan Lin, Yifan Peng, Fuhai Li, Jiarui Feng, Lei Liu","doi":"10.1002/sta4.649","DOIUrl":"https://doi.org/10.1002/sta4.649","url":null,"abstract":"Glaucoma is a major cause of blindness and vision impairment worldwide, and visual field (VF) tests are essential for monitoring the conversion of glaucoma. While previous studies have primarily focused on using VF data at a single time point for glaucoma prediction, there has been limited exploration of longitudinal trajectories. Additionally, many deep learning techniques treat the time-to-glaucoma prediction as a binary classification problem (glaucoma Yes/No), resulting in the misclassification of some censored subjects into the nonglaucoma category and decreased power. To tackle these challenges, we propose and implement several deep-learning approaches that naturally incorporate temporal and spatial information from longitudinal VF data to predict time-to-glaucoma. When evaluated on the Ocular Hypertension Treatment Study (OHTS) dataset, our proposed convolutional neural network (CNN)-long short-term memory (LSTM) emerged as the top-performing model among all those examined. The implementation code can be found online (https://github.com/rivenzhou/VF_prediction).","PeriodicalId":56159,"journal":{"name":"Stat","volume":"17 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139756536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of the density for censored and contaminated data 普查数据和污染数据的密度估算
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-07 DOI: 10.1002/sta4.651
Ingrid Van Keilegom, Elif Kekeç
Consider a situation where one is interested in estimating the density of a survival time that is subject to random right censoring and measurement errors. This happens often in practice, like in public health (pregnancy length), medicine (duration of infection), ecology (duration of forest fire), among others. We assume a classical additive measurement error model with Gaussian noise and unknown error variance and a random right censoring scheme. Under this setup, we develop minimal conditions under which the assumed model is identifiable when no auxiliary variables or validation data are available, and we offer a flexible estimation strategy using Laguerre polynomials for the estimation of the error variance and the density of the survival time. The asymptotic normality of the proposed estimators is established, and the numerical performance of the methodology is investigated on both simulated and real data on gestational age.
考虑这样一种情况,即我们有兴趣估计受随机右删减和测量误差影响的生存时间的密度。这种情况在实践中经常发生,如公共卫生(怀孕时间)、医学(感染持续时间)、生态学(森林火灾持续时间)等。我们假设一个经典的加性测量误差模型,具有高斯噪声、未知误差方差和随机右删减方案。在这种设置下,我们提出了在没有辅助变量或验证数据的情况下可识别假定模型的最低条件,并提供了使用拉盖尔多项式估算误差方差和生存时间密度的灵活估算策略。我们还提出了一种灵活的估算策略,利用拉格多项式来估算误差方差和生存时间密度。我们建立了所提出的估算器的渐近正态性,并在模拟和真实孕龄数据上研究了该方法的数值性能。
{"title":"Estimation of the density for censored and contaminated data","authors":"Ingrid Van Keilegom, Elif Kekeç","doi":"10.1002/sta4.651","DOIUrl":"https://doi.org/10.1002/sta4.651","url":null,"abstract":"Consider a situation where one is interested in estimating the density of a survival time that is subject to random right censoring and measurement errors. This happens often in practice, like in public health (pregnancy length), medicine (duration of infection), ecology (duration of forest fire), among others. We assume a classical additive measurement error model with Gaussian noise and unknown error variance and a random right censoring scheme. Under this setup, we develop minimal conditions under which the assumed model is identifiable when no auxiliary variables or validation data are available, and we offer a flexible estimation strategy using Laguerre polynomials for the estimation of the error variance and the density of the survival time. The asymptotic normality of the proposed estimators is established, and the numerical performance of the methodology is investigated on both simulated and real data on gestational age.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"28 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139756535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A confidence machine for sparse high-order interaction model 稀疏高阶交互模型置信机
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-05 DOI: 10.1002/sta4.633
Diptesh Das, Eugene Ndiaye, Ichiro Takeuchi
In predictive modelling for high-stake decision-making, predictors must be not only accurate but also reliable. Conformal prediction (CP) is a promising approach for obtaining the coverage of prediction results with fewer theoretical assumptions. To obtain the prediction set by so-called full-CP, we need to refit the predictor for all possible values of prediction results, which is only possible for simple predictors. For complex predictors such as random forests (RFs) or neural networks (NNs), split-CP is often employed where the data is split into two parts: one part for fitting and another for computing the prediction set. Unfortunately, because of the reduced sample size, split-CP is inferior to full-CP both in fitting as well as prediction set computation. In this paper, we develop a full-CP of sparse high-order interaction model (SHIM), which is sufficiently flexible as it can take into account high-order interactions among variables. We resolve the computational challenge for full-CP of SHIM by introducing a novel approach called homotopy mining. Through numerical experiments, we demonstrate that SHIM is as accurate as complex predictors such as RF and NN and enjoys the superior statistical power of full-CP.
在用于高风险决策的预测建模中,预测器不仅要准确,而且要可靠。共形预测(CP)是一种以较少理论假设获得预测结果覆盖面的有前途的方法。要通过所谓的全共形预测(full-CP)获得预测集,我们需要针对预测结果的所有可能值重新拟合预测器,而这只适用于简单的预测器。对于随机森林(RF)或神经网络(NN)等复杂预测器,通常采用拆分式 CP,即将数据拆分为两部分:一部分用于拟合,另一部分用于计算预测集。遗憾的是,由于样本量减少,拆分式 CP 在拟合和预测集计算方面都不如完全式 CP。在本文中,我们开发了一种稀疏高阶交互模型(SHIM)的全CP,它具有足够的灵活性,可以考虑变量间的高阶交互作用。我们通过引入一种名为同调挖掘(homotopy mining)的新方法,解决了 SHIM 全 CP 的计算难题。通过数值实验,我们证明了 SHIM 与 RF 和 NN 等复杂预测器一样准确,并具有全 CP 的卓越统计能力。
{"title":"A confidence machine for sparse high-order interaction model","authors":"Diptesh Das, Eugene Ndiaye, Ichiro Takeuchi","doi":"10.1002/sta4.633","DOIUrl":"https://doi.org/10.1002/sta4.633","url":null,"abstract":"In predictive modelling for high-stake decision-making, predictors must be not only accurate but also reliable. Conformal prediction (CP) is a promising approach for obtaining the coverage of prediction results with fewer theoretical assumptions. To obtain the prediction set by so-called full-CP, we need to refit the predictor for all possible values of prediction results, which is only possible for simple predictors. For complex predictors such as random forests (RFs) or neural networks (NNs), split-CP is often employed where the data is split into two parts: one part for fitting and another for computing the prediction set. Unfortunately, because of the reduced sample size, split-CP is inferior to full-CP both in fitting as well as prediction set computation. In this paper, we develop a full-CP of sparse high-order interaction model (SHIM), which is sufficiently flexible as it can take into account high-order interactions among variables. We resolve the computational challenge for full-CP of SHIM by introducing a novel approach called homotopy mining. Through numerical experiments, we demonstrate that SHIM is as accurate as complex predictors such as RF and NN and enjoys the superior statistical power of full-CP.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"35 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139756534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Softplus negative binomial network autoregression 软加负二项网络自回归
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-18 DOI: 10.1002/sta4.638
Xiangyu Guo, Fukang Zhu
Modelling multivariate time series of counts in a parsimonious way is a popular topic. In this paper, we consider an integer-valued network autoregressive model with a non-random neighbourhood structure, which uses negative binomial distribution as the conditional marginal distribution and the softplus function as the link function. The new model generalizes existing ones in the literature and has a great flexibility in modelling. Stationary conditions in cases of fixed dimension and increasing dimension are given. Parameters are estimated by maximizing the quasi-likelihood function, and related asymptotic properties of the estimators are established. A simulation study is conducted to assess performances of the estimators, and a real data example is analysed to show superior performances of the proposed model compared with existing ones.
以一种简洁的方式对计数的多变量时间序列建模是一个热门话题。在本文中,我们考虑了一种具有非随机邻域结构的整数值网络自回归模型,该模型使用负二项分布作为条件边际分布,使用软加函数作为链接函数。新模型概括了现有文献中的模型,在建模方面具有很大的灵活性。给出了固定维度和增大维度情况下的静态条件。通过最大化准似然比函数来估计参数,并建立了估计器的相关渐近特性。通过模拟研究评估了估计器的性能,并分析了一个真实数据实例,以显示与现有模型相比,所提出的模型具有更优越的性能。
{"title":"Softplus negative binomial network autoregression","authors":"Xiangyu Guo, Fukang Zhu","doi":"10.1002/sta4.638","DOIUrl":"https://doi.org/10.1002/sta4.638","url":null,"abstract":"Modelling multivariate time series of counts in a parsimonious way is a popular topic. In this paper, we consider an integer-valued network autoregressive model with a non-random neighbourhood structure, which uses negative binomial distribution as the conditional marginal distribution and the softplus function as the link function. The new model generalizes existing ones in the literature and has a great flexibility in modelling. Stationary conditions in cases of fixed dimension and increasing dimension are given. Parameters are estimated by maximizing the quasi-likelihood function, and related asymptotic properties of the estimators are established. A simulation study is conducted to assess performances of the estimators, and a real data example is analysed to show superior performances of the proposed model compared with existing ones.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"12 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139499895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ordered probit Bayesian additive regression trees for ordinal data 用于序数数据的有序概率贝叶斯加法回归树
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-17 DOI: 10.1002/sta4.643
Jaeyong Lee, Beom Seuk Hwang
Bayesian additive regression trees (BART) is a nonparametric model that is known for its flexibility and strong statistical foundation. To address a robust and flexible approach to analyse ordinal data, we extend BART into an ordered probit regression framework (OPBART). Further, we propose a semiparametric setting for OPBART (semi-OPBART) to model covariates of interest parametrically and confounding variables nonparametrically. We also provide Gibbs sampling procedures to implement the proposed models. In both simulations and real data studies, the proposed models demonstrate superior performance over other competing ordinal models. We also highlight enhanced interpretability of semi-OPBART in terms of inference through marginal effects.
贝叶斯加性回归树(BART)是一种非参数模型,以其灵活性和坚实的统计基础而著称。为了采用一种稳健而灵活的方法来分析序数数据,我们将 BART 扩展为有序 probit 回归框架(OPBART)。此外,我们还提出了 OPBART 的半参数设置(semi-OPBART),对相关协变量进行参数建模,对混杂变量进行非参数建模。我们还提供了吉布斯抽样程序来实现所提出的模型。在模拟和实际数据研究中,所提出的模型都显示出优于其他竞争性序数模型的性能。我们还强调了半 OPBART 在通过边际效应推断方面更强的可解释性。
{"title":"Ordered probit Bayesian additive regression trees for ordinal data","authors":"Jaeyong Lee, Beom Seuk Hwang","doi":"10.1002/sta4.643","DOIUrl":"https://doi.org/10.1002/sta4.643","url":null,"abstract":"Bayesian additive regression trees (BART) is a nonparametric model that is known for its flexibility and strong statistical foundation. To address a robust and flexible approach to analyse ordinal data, we extend BART into an ordered probit regression framework (OPBART). Further, we propose a semiparametric setting for OPBART (semi-OPBART) to model covariates of interest parametrically and confounding variables nonparametrically. We also provide Gibbs sampling procedures to implement the proposed models. In both simulations and real data studies, the proposed models demonstrate superior performance over other competing ordinal models. We also highlight enhanced interpretability of semi-OPBART in terms of inference through marginal effects.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"12 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139495176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differentially private outcome-weighted learning for optimal dynamic treatment regime estimation 最优动态治疗制度估算的差异化私人结果加权学习
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-17 DOI: 10.1002/sta4.641
Dylan Spicker, Erica E. M. Moodie, Susan M. Shortreed
Precision medicine is a framework for developing evidence-based medical recommendations that seeks to determine the optimal sequence of treatments, tailored to all of the relevant, observable patient-level characteristics. Because precision medicine relies on highly sensitive, patient-level data, ensuring the privacy of participants is of great importance. Dynamic treatment regimes (DTRs) provide one formalization of precision medicine in a longitudinal setting. Outcome-weighted learning (OWL) is a family of techniques for estimating optimal DTRs based on observational data. OWL techniques leverage support vector machine (SVM) classifiers in order to perform estimation. SVMs perform classification based on a set of influential points in the data known as support vectors. The classification rule produced by SVMs often requires direct access to the support vectors. Thus, releasing a treatment policy estimated with OWL requires the release of patient data for a subset of patients in the sample. As a result, the classification rules from SVMs constitute a severe privacy violation for those individuals whose data comprise the support vectors. This privacy violation is a major concern, particularly in light of the potentially highly sensitive medical data that are used in DTR estimation. Differential privacy has emerged as a mathematical framework for ensuring the privacy of individual-level data, with provable guarantees on the likelihood that individual characteristics can be determined by an adversary. We provide the first investigation of differential privacy in the context of DTRs and provide a differentially private OWL estimator, with theoretical results allowing us to quantify the cost of privacy in terms of the accuracy of the private estimators.
精准医疗是一种制定循证医疗建议的框架,旨在根据所有相关的、可观察到的患者水平特征,确定最佳的治疗顺序。由于精准医疗依赖于高度敏感的患者级数据,因此确保参与者的隐私非常重要。动态治疗机制(DTR)是纵向精准医疗的一种形式化。结果加权学习(OWL)是基于观察数据估算最佳动态治疗方案的一系列技术。OWL 技术利用支持向量机(SVM)分类器进行估算。SVM 基于数据中一组有影响力的点进行分类,这些点被称为支持向量。SVM 生成的分类规则通常需要直接访问支持向量。因此,发布使用 OWL 估算的治疗策略需要发布样本中一部分患者的数据。因此,SVM 的分类规则严重侵犯了支持向量所包含的个人隐私。这种隐私侵犯是一个重大问题,特别是考虑到 DTR 估算中使用的医疗数据可能具有高度敏感性。差分隐私已成为确保个人数据隐私的数学框架,可证明对手确定个人特征的可能性。我们首次在 DTR 的背景下对差分隐私进行了研究,并提供了一种差分隐私 OWL 估算器,其理论结果使我们能够以隐私估算器的准确性来量化隐私成本。
{"title":"Differentially private outcome-weighted learning for optimal dynamic treatment regime estimation","authors":"Dylan Spicker, Erica E. M. Moodie, Susan M. Shortreed","doi":"10.1002/sta4.641","DOIUrl":"https://doi.org/10.1002/sta4.641","url":null,"abstract":"Precision medicine is a framework for developing evidence-based medical recommendations that seeks to determine the optimal sequence of treatments, tailored to all of the relevant, observable patient-level characteristics. Because precision medicine relies on highly sensitive, patient-level data, ensuring the privacy of participants is of great importance. Dynamic treatment regimes (DTRs) provide one formalization of precision medicine in a longitudinal setting. Outcome-weighted learning (OWL) is a family of techniques for estimating optimal DTRs based on observational data. OWL techniques leverage support vector machine (SVM) classifiers in order to perform estimation. SVMs perform classification based on a set of influential points in the data known as support vectors. The classification rule produced by SVMs often requires direct access to the support vectors. Thus, releasing a treatment policy estimated with OWL requires the release of patient data for a subset of patients in the sample. As a result, the classification rules from SVMs constitute a severe privacy violation for those individuals whose data comprise the support vectors. This privacy violation is a major concern, particularly in light of the potentially highly sensitive medical data that are used in DTR estimation. Differential privacy has emerged as a mathematical framework for ensuring the privacy of individual-level data, with provable guarantees on the likelihood that individual characteristics can be determined by an adversary. We provide the first investigation of differential privacy in the context of DTRs and provide a differentially private OWL estimator, with theoretical results allowing us to quantify the cost of privacy in terms of the accuracy of the private estimators.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"12 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139499925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of network-guided transcriptomic risk score for disease prediction 开发用于疾病预测的网络引导转录组风险评分
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-16 DOI: 10.1002/sta4.648
Xuan Cao, Liangliang Zhang, Kyoungjae Lee
Omics data, routinely collected in various clinical settings, are of a complex and network-structured nature. Recent progress in RNA sequencing (RNA-seq) allows us to explore whole-genome gene expression profiles and to develop predictive model for disease risk. In this study, we propose a novel Bayesian approach to construct RNA-seq-based risk score leveraging gene expression network for disease risk prediction. Specifically, we consider a hierarchical model with spike and slab priors over regression coefficients as well as entries in the inverse covariance matrix for covariates to simultaneously perform variable selection and network estimation in high-dimensional logistic regression. Through theoretical investigation and simulation studies, our method is shown to both enjoy desirable consistency properties and achieve superior empirical performance compared with other state-of-the-art methods. We analyse RNA-seq gene expression data from 441 asthmatic and 254 non-asthmatic samples to form a weighted network-guided risk score and benchmark the proposed method against existing approaches for asthma risk stratification.
在各种临床环境中常规收集的 Omics 数据具有复杂的网络结构性质。最近在 RNA 测序(RNA-seq)方面取得的进展使我们能够探索全基因组基因表达谱,并建立疾病风险预测模型。在本研究中,我们提出了一种新颖的贝叶斯方法,利用基因表达网络构建基于 RNA-seq 的风险评分,用于疾病风险预测。具体来说,我们考虑了一个分层模型,该模型对回归系数以及协变量的逆协方差矩阵中的条目具有尖峰和板块前验,可在高维逻辑回归中同时执行变量选择和网络估计。通过理论研究和模拟研究,我们的方法不仅具有理想的一致性,而且与其他最先进的方法相比具有更优越的经验性能。我们分析了 441 个哮喘样本和 254 个非哮喘样本的 RNA-seq 基因表达数据,形成了加权网络指导风险评分,并将所提出的方法与现有的哮喘风险分层方法进行了比较。
{"title":"Development of network-guided transcriptomic risk score for disease prediction","authors":"Xuan Cao, Liangliang Zhang, Kyoungjae Lee","doi":"10.1002/sta4.648","DOIUrl":"https://doi.org/10.1002/sta4.648","url":null,"abstract":"Omics data, routinely collected in various clinical settings, are of a complex and network-structured nature. Recent progress in RNA sequencing (RNA-seq) allows us to explore whole-genome gene expression profiles and to develop predictive model for disease risk. In this study, we propose a novel Bayesian approach to construct RNA-seq-based risk score leveraging gene expression network for disease risk prediction. Specifically, we consider a hierarchical model with spike and slab priors over regression coefficients as well as entries in the inverse covariance matrix for covariates to simultaneously perform variable selection and network estimation in high-dimensional logistic regression. Through theoretical investigation and simulation studies, our method is shown to both enjoy desirable consistency properties and achieve superior empirical performance compared with other state-of-the-art methods. We analyse RNA-seq gene expression data from 441 asthmatic and 254 non-asthmatic samples to form a weighted network-guided risk score and benchmark the proposed method against existing approaches for asthma risk stratification.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"37 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139481289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Some benefits of standardisation for conditional extremes 条件极值标准化的一些好处
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-14 DOI: 10.1002/sta4.647
Christian Rohrbeck, Jonathan A. Tawn
A key aspect where extreme values methods differ from standard statistical models is through having asymptotic theory to provide a theoretical justification for the nature of the models used for extrapolation. In multivariate extremes, many different asymptotic theories have been proposed, partly as a consequence of the lack of ordering property with vector random variables. One class of multivariate models, based on conditional limit theory as one variable becomes extreme has received wide practical usage. The underpinning value of this approach has been supported by further theoretical characterisations of the limiting relationships. However, the paper “Conditional extreme value models: fallacies and pitfalls” by Holger Drees and Anja Janßen provides a number of counterexamples to these results. This paper studies these counterexamples in a conditional extremes framework which involves marginal standardisation to a common exponentially decaying tailed marginal distribution. Our calculations show that some of the issues identified can be addressed in this way.
极值方法有别于标准统计模型的一个重要方面,是通过渐近理论为用于外推的模型的性质提供理论依据。在多变量极值中,已经提出了许多不同的渐近理论,部分原因是向量随机变量缺乏有序性。其中一类多变量模型以条件极限理论为基础,当一个变量变得极端时,得到了广泛的实际应用。这种方法的基础价值得到了极限关系的进一步理论描述的支持。然而,霍尔格-德雷斯(Holger Drees)和安雅-扬森(Anja Janßen)的论文《条件极值模型:谬误与陷阱》对这些结果提出了许多反例。本文在条件极值框架下对这些反例进行了研究,该框架涉及对常见指数衰减尾边际分布的边际标准化。我们的计算表明,所发现的一些问题可以用这种方法来解决。
{"title":"Some benefits of standardisation for conditional extremes","authors":"Christian Rohrbeck, Jonathan A. Tawn","doi":"10.1002/sta4.647","DOIUrl":"https://doi.org/10.1002/sta4.647","url":null,"abstract":"A key aspect where extreme values methods differ from standard statistical models is through having asymptotic theory to provide a theoretical justification for the nature of the models used for extrapolation. In multivariate extremes, many different asymptotic theories have been proposed, partly as a consequence of the lack of ordering property with vector random variables. One class of multivariate models, based on conditional limit theory as one variable becomes extreme has received wide practical usage. The underpinning value of this approach has been supported by further theoretical characterisations of the limiting relationships. However, the paper “Conditional extreme value models: fallacies and pitfalls” by Holger Drees and Anja Janßen provides a number of counterexamples to these results. This paper studies these counterexamples in a conditional extremes framework which involves marginal standardisation to a common exponentially decaying tailed marginal distribution. Our calculations show that some of the issues identified can be addressed in this way.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"5 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139476920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing partnerships for academic data science consulting and collaboration units 发展学术数据科学咨询与合作单位的伙伴关系
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-11 DOI: 10.1002/sta4.644
Marianne Huebner, Laura Bond, Felesia Stukes, Joel Herndon, David J. Edwards, Gina-Maria Pomann
Data science consulting and collaboration units (DSUs) are core infrastructure for research at universities. Activities span data management, study design, data analysis, data visualization, predictive modelling, preparing reports, manuscript writing and advising on statistical methods and may include an experiential or teaching component. Partnerships are needed for a thriving DSU as an active part of the larger university network. Guidance for identifying, developing and managing successful partnerships for DSUs can be summarized in six rules: (1) align with institutional strategic plans, (2) cultivate partnerships that fit your mission, (3) ensure sustainability and prepare for growth, (4) define clear expectations in a partnership agreement, (5) communicate and (6) expect the unexpected. While these rules are not exhaustive, they are derived from experiences in a diverse set of DSUs, which vary by administrative home, mission, staffing and funding model. As examples in this paper illustrate, these rules can be adapted to different organizational models for DSUs. Clear expectations in partnership agreements are essential for high quality and consistent collaborations and address core activities, duration, staffing, cost and evaluation. A DSU is an organizational asset that should involve thoughtful investment if the institution is to gain real value.
数据科学咨询与合作单位(DSU)是大学研究的核心基础设施。其活动包括数据管理、研究设计、数据分析、数据可视化、预测建模、编写报告、撰写手稿和提供统计方法建议,还可能包括体验或教学内容。作为大学网络的一个积极组成部分,数据科学大学的蓬勃发展需要伙伴关系。有关确定、发展和管理成功合作关系的指导原则可以概括为六条:(1) 与机构战略计划保持一致;(2) 培养符合自身使命的合作关系;(3) 确保可持续性并为发展做好准备;(4) 在合作协议中明确预期;(5) 沟通;(6) 预计意外情况。虽然这些规则并非详尽无遗,但它们都是根据不同 DSU 的经验总结出来的,这些 DSU 的行政归属、使命、人员配备和筹资模式各不相同。正如本文中的例子所示,这些规则可适用于不同组织模式的数据收集股。伙伴关系协议中明确的预期对于高质量和一致的合作至关重要,这些预期涉及核心活动、期限、人员配备、成本和评估。数据收集与分析单位是一种组织资产,如果机构要获得真正的价值,就应该进行深思熟虑的投资。
{"title":"Developing partnerships for academic data science consulting and collaboration units","authors":"Marianne Huebner, Laura Bond, Felesia Stukes, Joel Herndon, David J. Edwards, Gina-Maria Pomann","doi":"10.1002/sta4.644","DOIUrl":"https://doi.org/10.1002/sta4.644","url":null,"abstract":"Data science consulting and collaboration units (DSUs) are core infrastructure for research at universities. Activities span data management, study design, data analysis, data visualization, predictive modelling, preparing reports, manuscript writing and advising on statistical methods and may include an experiential or teaching component. Partnerships are needed for a thriving DSU as an active part of the larger university network. Guidance for identifying, developing and managing successful partnerships for DSUs can be summarized in six rules: (1) align with institutional strategic plans, (2) cultivate partnerships that fit your mission, (3) ensure sustainability and prepare for growth, (4) define clear expectations in a partnership agreement, (5) communicate and (6) expect the unexpected. While these rules are not exhaustive, they are derived from experiences in a diverse set of DSUs, which vary by administrative home, mission, staffing and funding model. As examples in this paper illustrate, these rules can be adapted to different organizational models for DSUs. Clear expectations in partnership agreements are essential for high quality and consistent collaborations and address core activities, duration, staffing, cost and evaluation. A DSU is an organizational asset that should involve thoughtful investment if the institution is to gain real value.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"5 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139459547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Equivalence testing for multiple groups 多组等效测试
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-10 DOI: 10.1002/sta4.645
Tony Pourmohamad, Herbert K. H. Lee
Testing for equivalence, rather than testing for a difference, is an important component of some scientific studies. While the focus of the existing literature is on comparing two groups for equivalence, real-world applications arise regularly that require testing across more than two groups. This paper reviews the existing approaches for testing across multiple groups and proposes a novel framework for multigroup equivalence testing under a Bayesian paradigm. This approach allows for a more scientifically meaningful definition of the equivalence margin and a more powerful test than the few existing alternatives. This approach also allows a new definition of equivalence based on future differences.
等效测试而非差异测试是某些科学研究的重要组成部分。虽然现有文献的重点是比较两组的等效性,但实际应用中经常出现需要跨两组以上进行测试的情况。本文回顾了现有的多组测试方法,并提出了贝叶斯范式下的多组等效性测试新框架。与现有的几种替代方法相比,这种方法对等值边际的定义更具科学意义,测试功能也更强大。这种方法还可以根据未来差异对等效性进行新的定义。
{"title":"Equivalence testing for multiple groups","authors":"Tony Pourmohamad, Herbert K. H. Lee","doi":"10.1002/sta4.645","DOIUrl":"https://doi.org/10.1002/sta4.645","url":null,"abstract":"Testing for equivalence, rather than testing for a difference, is an important component of some scientific studies. While the focus of the existing literature is on comparing two groups for equivalence, real-world applications arise regularly that require testing across more than two groups. This paper reviews the existing approaches for testing across multiple groups and proposes a novel framework for multigroup equivalence testing under a Bayesian paradigm. This approach allows for a more scientifically meaningful definition of the equivalence margin and a more powerful test than the few existing alternatives. This approach also allows a new definition of equivalence based on future differences.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"3 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139460065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Stat
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1