首页 > 最新文献

Annals of Applied Statistics最新文献

英文 中文
A LATENT VARIABLE MIXTURE MODEL FOR COMPOSITION-ON-COMPOSITION REGRESSION WITH APPLICATION TO CHEMICAL RECYCLING. 成分-成分回归的潜在变量混合模型及其在化工回收中的应用。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-aoas1935
Nicholas Rios, Lingzhou Xue, Xiang Zhan

It is quite common to encounter compositional data in a regression framework in data analysis. When both responses and predictors are compositional, most existing models rely on a family of log-ratio based transformations to move the analysis from the simplex to the reals. This often makes the interpretation of the model more complex. A transformation-free regression model was recently developed, but it only allows for a single compositional predictor. However, many datasets include multiple compositional predictors of interest. Motivated by an application to hydrothermal liquefaction (HTL) data, a novel extension of this transformation-free regression model is provided that allows for two (or more) compositional predictors to be used via a latent variable mixture. A modified expectation-maximization algorithm is proposed to estimate model parameters, which are shown to have natural interpretations. Conformal inference is used to obtain prediction limits on the compositional response. The resulting methodology is applied to the HTL dataset. Extensions to multiple predictors are discussed.

在数据分析中,在回归框架中遇到组合数据是很常见的。当响应和预测都是组合时,大多数现有模型依赖于一系列基于对数比的转换,将分析从单纯形转移到实数。这通常使模型的解释更加复杂。最近开发了一种无需转换的回归模型,但它只允许使用单个组合预测器。然而,许多数据集包含多个感兴趣的成分预测因子。受热液液化(HTL)数据应用的启发,提供了该无转换回归模型的新扩展,该模型允许通过潜在变量混合使用两个(或更多)成分预测因子。提出了一种改进的期望最大化算法来估计模型参数,结果表明模型参数具有自然解释。用共形推理得到了组合响应的预测极限。将得到的方法应用于html数据集。讨论了对多个预测器的扩展。
{"title":"A LATENT VARIABLE MIXTURE MODEL FOR COMPOSITION-ON-COMPOSITION REGRESSION WITH APPLICATION TO CHEMICAL RECYCLING.","authors":"Nicholas Rios, Lingzhou Xue, Xiang Zhan","doi":"10.1214/24-aoas1935","DOIUrl":"10.1214/24-aoas1935","url":null,"abstract":"<p><p>It is quite common to encounter compositional data in a regression framework in data analysis. When both responses and predictors are compositional, most existing models rely on a family of log-ratio based transformations to move the analysis from the simplex to the reals. This often makes the interpretation of the model more complex. A transformation-free regression model was recently developed, but it only allows for a single compositional predictor. However, many datasets include multiple compositional predictors of interest. Motivated by an application to hydrothermal liquefaction (HTL) data, a novel extension of this transformation-free regression model is provided that allows for two (or more) compositional predictors to be used via a latent variable mixture. A modified expectation-maximization algorithm is proposed to estimate model parameters, which are shown to have natural interpretations. Conformal inference is used to obtain prediction limits on the compositional response. The resulting methodology is applied to the HTL dataset. Extensions to multiple predictors are discussed.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"3253-3273"},"PeriodicalIF":1.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145114836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A SEMIPARAMETRIC METHOD FOR RISK PREDICTION USING INTEGRATED ELECTRONIC HEALTH RECORD DATA. 综合电子病历数据风险预测的半参数方法。
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-AOAS1938
Jill Hasler, Yanyuan Ma, Yizheng Wei, Ravi Parikh, Jinbo Chen

When using electronic health records (EHRs) for clinical and translational research, additional data is often available from external sources to enrich the information extracted from EHRs. For example, academic biobanks have more granular data available, and patient reported data is often collected through small-scale surveys. It is common that the external data is available only for a small subset of patients who have EHR information. We propose efficient and robust methods for building and evaluating models for predicting the risk of binary outcomes using such integrated EHR data. Our method is built upon an idea derived from the two-phase design literature that modeling the availability of a patient's external data as a function of an EHR-based preliminary predictive score leads to effective utilization of the EHR data. Through both theoretical and simulation studies, we show that our method has high efficiency for estimating log-odds ratio parameters, the area under the ROC curve, as well as other measures for quantifying predictive accuracy. We apply our method to develop a model for predicting the short-term mortality risk of oncology patients, where the data was extracted from the University of Pennsylvania hospital system EHR and combined with survey-based patient reported outcome data.

在使用电子健康记录(EHRs)进行临床和转译研究时,通常可以从外部来源获得额外的数据,以丰富从EHRs中提取的信息。例如,学术生物银行拥有更细粒度的数据,而患者报告的数据通常是通过小规模调查收集的。通常,只有一小部分拥有电子病历信息的患者可以获得外部数据。我们提出了有效和稳健的方法来建立和评估模型,预测二元结果的风险,使用这种集成的电子病历数据。我们的方法建立在两阶段设计文献的思想之上,即将患者外部数据的可用性建模为基于EHR的初步预测评分的函数,从而有效地利用EHR数据。通过理论和仿真研究,我们证明了我们的方法在估计对数-优势比参数、ROC曲线下面积以及其他量化预测精度的措施方面具有很高的效率。我们运用我们的方法开发了一个预测肿瘤患者短期死亡风险的模型,该模型的数据提取自宾夕法尼亚大学医院系统的电子病历,并结合基于调查的患者报告的结果数据。
{"title":"A SEMIPARAMETRIC METHOD FOR RISK PREDICTION USING INTEGRATED ELECTRONIC HEALTH RECORD DATA.","authors":"Jill Hasler, Yanyuan Ma, Yizheng Wei, Ravi Parikh, Jinbo Chen","doi":"10.1214/24-AOAS1938","DOIUrl":"10.1214/24-AOAS1938","url":null,"abstract":"<p><p>When using electronic health records (EHRs) for clinical and translational research, additional data is often available from external sources to enrich the information extracted from EHRs. For example, academic biobanks have more granular data available, and patient reported data is often collected through small-scale surveys. It is common that the external data is available only for a small subset of patients who have EHR information. We propose efficient and robust methods for building and evaluating models for predicting the risk of binary outcomes using such integrated EHR data. Our method is built upon an idea derived from the two-phase design literature that modeling the availability of a patient's external data as a function of an EHR-based preliminary predictive score leads to effective utilization of the EHR data. Through both theoretical and simulation studies, we show that our method has high efficiency for estimating log-odds ratio parameters, the area under the ROC curve, as well as other measures for quantifying predictive accuracy. We apply our method to develop a model for predicting the short-term mortality risk of oncology patients, where the data was extracted from the University of Pennsylvania hospital system EHR and combined with survey-based patient reported outcome data.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"3318-3337"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934126/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143711932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MODELING TRAJECTORIES USING FUNCTIONAL LINEAR DIFFERENTIAL EQUATIONS. 利用函数线性微分方程建模轨迹。
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-aoas1943
Julia Wrobel, Britton Sauerbrei, Eric A Kirk, Jian-Zhong Guo, Adam Hantman, Jeff Goldsmith

We are motivated by a study that seeks to better understand the dynamic relationship between muscle activation and paw position during locomotion. For each gait cycle in this experiment, activation in the biceps and triceps is measured continuously and in parallel with paw position as a mouse trotted on a treadmill. We propose an innovative general regression method that draws from both ordinary differential equations and functional data analysis to model the relationship between these functional inputs and responses as a dynamical system that evolves over time. Specifically, our model addresses gaps in both literatures and borrows strength across curves estimating ODE parameters across all curves simultaneously rather than separately modeling each functional observation. Our approach compares favorably to related functional data methods in simulations and in cross-validated predictive accuracy of paw position in the gait data. In the analysis of the gait cycles, we find that paw speed and position are dynamically influenced by inputs from the biceps and triceps muscles and that the effect of muscle activation persists beyond the activation itself.

我们的动机是一项旨在更好地理解运动过程中肌肉激活和爪子位置之间的动态关系的研究。在本实验的每个步态周期中,连续测量二头肌和三头肌的激活,并与小鼠在跑步机上小跑时的脚掌位置平行。我们提出了一种创新的一般回归方法,该方法利用常微分方程和函数数据分析来模拟这些函数输入和响应之间的关系,作为一个随时间演变的动态系统。具体来说,我们的模型解决了两篇文献中的空白,并借用了曲线间的强度,同时估计所有曲线上的ODE参数,而不是单独建模每个功能观测值。我们的方法在模拟和步态数据中交叉验证的爪子位置预测准确性方面优于相关的功能数据方法。在步态周期的分析中,我们发现爪子的速度和位置受到二头肌和三头肌输入的动态影响,并且肌肉激活的影响持续存在于激活本身之外。
{"title":"MODELING TRAJECTORIES USING FUNCTIONAL LINEAR DIFFERENTIAL EQUATIONS.","authors":"Julia Wrobel, Britton Sauerbrei, Eric A Kirk, Jian-Zhong Guo, Adam Hantman, Jeff Goldsmith","doi":"10.1214/24-aoas1943","DOIUrl":"10.1214/24-aoas1943","url":null,"abstract":"<p><p>We are motivated by a study that seeks to better understand the dynamic relationship between muscle activation and paw position during locomotion. For each gait cycle in this experiment, activation in the biceps and triceps is measured continuously and in parallel with paw position as a mouse trotted on a treadmill. We propose an innovative general regression method that draws from both ordinary differential equations and functional data analysis to model the relationship between these functional inputs and responses as a dynamical system that evolves over time. Specifically, our model addresses gaps in both literatures and borrows strength across curves estimating ODE parameters across all curves simultaneously rather than separately modeling each functional observation. Our approach compares favorably to related functional data methods in simulations and in cross-validated predictive accuracy of paw position in the gait data. In the analysis of the gait cycles, we find that paw speed and position are dynamically influenced by inputs from the biceps and triceps muscles and that the effect of muscle activation persists beyond the activation itself.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"3425-3443"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143711989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
INDIVIDUAL DYNAMIC PREDICTION FOR CURE AND SURVIVAL BASED ON LONGITUDINAL BIOMARKERS. 基于纵向生物标志物的个体动态治愈和生存预测。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-aoas1906
Can Xie, Xuelin Huang, Ruosha Li, Alexander Tsodikov, Kapil Bhalla

To optimize personalized treatment strategies and extend patients' survival times, it is critical to accurately predict patients' prognoses at all stages, from disease diagnosis to follow-up visits. The longitudinal biomarker measurements during visits are essential for this prediction purpose. Patients' ultimate concerns are cure and survival. However, in many situations, there is no clear biomarker indicator for cure. We propose a comprehensive joint model of longitudinal and survival data and a landmark cure model, incorporating proportions of potentially cured patients. The survival distributions in the joint and landmark models are specified through flexible hazard functions with the proportional hazards as a special case, allowing other patterns such as crossing hazard and survival functions. Formulas are provided for predicting each individual's probabilities of future cure and survival at any time point based on his or her current biomarker history. Simulations show that, with these comprehensive and flexible properties, the proposed cure models outperform standard cure models in terms of predictive performance, measured by the time-dependent area under the curve of receiver operating characteristic, Brier score, and integrated Brier score. The use and advantages of the proposed models are illustrated by their application to a study of patients with chronic myeloid leukemia.

为了优化个性化治疗策略,延长患者的生存时间,准确预测患者从疾病诊断到随访的各个阶段的预后至关重要。访问期间的纵向生物标志物测量对于这一预测目的至关重要。患者最关心的是治愈和生存。然而,在许多情况下,没有明确的生物标志物指标来治愈。我们提出了一个综合纵向和生存数据的联合模型和一个包含潜在治愈患者比例的里程碑式治愈模型。联合模型和地标模型中的生存分布是通过灵活的风险函数来指定的,比例风险是一种特殊情况,允许其他模式,如交叉风险和生存函数。公式提供了预测每个人的未来治愈和生存的概率在任何时间点基于他或她目前的生物标志物的历史。仿真结果表明,由于具有这些综合和灵活的特性,所提出的固化模型在预测性能方面优于标准固化模型(通过接受者工作特征曲线下的时间依赖面积、Brier评分和综合Brier评分来衡量)。通过对慢性髓性白血病患者的研究,说明了所提出模型的用途和优点。
{"title":"INDIVIDUAL DYNAMIC PREDICTION FOR CURE AND SURVIVAL BASED ON LONGITUDINAL BIOMARKERS.","authors":"Can Xie, Xuelin Huang, Ruosha Li, Alexander Tsodikov, Kapil Bhalla","doi":"10.1214/24-aoas1906","DOIUrl":"10.1214/24-aoas1906","url":null,"abstract":"<p><p>To optimize personalized treatment strategies and extend patients' survival times, it is critical to accurately predict patients' prognoses at all stages, from disease diagnosis to follow-up visits. The longitudinal biomarker measurements during visits are essential for this prediction purpose. Patients' ultimate concerns are cure and survival. However, in many situations, there is no clear biomarker indicator for cure. We propose a comprehensive joint model of longitudinal and survival data and a landmark cure model, incorporating proportions of potentially cured patients. The survival distributions in the joint and landmark models are specified through flexible hazard functions with the proportional hazards as a special case, allowing other patterns such as crossing hazard and survival functions. Formulas are provided for predicting each individual's probabilities of future cure and survival at any time point based on his or her current biomarker history. Simulations show that, with these comprehensive and flexible properties, the proposed cure models outperform standard cure models in terms of predictive performance, measured by the time-dependent area under the curve of receiver operating characteristic, Brier score, and integrated Brier score. The use and advantages of the proposed models are illustrated by their application to a study of patients with chronic myeloid leukemia.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"2796-2817"},"PeriodicalIF":1.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11864788/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143525086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STATISTICAL CURVE MODELS FOR INFERRING 3D CHROMATIN ARCHITECTURE. 用于推断三维染色质结构的统计曲线模型。
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-AOAS1917
Elena Tuzhilina, Trevor Hastie, Mark Segal

Reconstructing three-dimensional (3D) chromatin structure from conformation capture assays (such as Hi-C) is a critical task in computational biology, since chromatin spatial architecture plays a vital role in numerous cellular processes and direct imaging is challenging. Most existing algorithms that operate on Hi-C contact matrices produce reconstructed 3D configurations in the form of a polygonal chain. However, none of the methods exploit the fact that the target solution is a (smooth) curve in 3D: this contiguity attribute is either ignored or indirectly addressed by imposing spatial constraints that are challenging to formulate. In this paper we develop both B-spline and smoothing spline techniques for directly capturing this potentially complex 1D curve. We subsequently combine these techniques with a Poisson model for contact counts and compare their performance on a real data example. In addition, motivated by the sparsity of Hi-C contact data, especially when obtained from single-cell assays, we appreciably extend the class of distributions used to model contact counts. We build a general distribution-based metric scaling ( DBMS ) framework from which we develop zero-inflated and Hurdle Poisson models as well as negative binomial applications. Illustrative applications make recourse to bulk Hi-C data from IMR90 cells and single-cell Hi-C data from mouse embryonic stem cells.

从构象捕获分析(如Hi-C)中重建三维(3D)染色质结构是计算生物学中的一项关键任务,因为染色质空间结构在许多细胞过程中起着至关重要的作用,直接成像具有挑战性。大多数现有的在Hi-C接触矩阵上操作的算法以多边形链的形式产生重建的三维构型。然而,没有一种方法利用目标解是3D(光滑)曲线的事实:这种连续性属性要么被忽略,要么通过施加具有挑战性的空间约束来间接解决。在本文中,我们开发了b样条和平滑样条技术来直接捕获这种潜在复杂的一维曲线。随后,我们将这些技术与接触计数的泊松模型结合起来,并在实际数据示例中比较它们的性能。此外,由于Hi-C接触数据的稀疏性,特别是当从单细胞测定中获得时,我们明显地扩展了用于模拟接触计数的分布类别。我们建立了一个通用的基于分布的度量尺度(DBMS)框架,在此基础上我们开发了零膨胀和障碍泊松模型以及负二项应用。说明性应用利用来自IMR90细胞的大量高碳含量数据和来自小鼠胚胎干细胞的单细胞高碳含量数据。
{"title":"STATISTICAL CURVE MODELS FOR INFERRING 3D CHROMATIN ARCHITECTURE.","authors":"Elena Tuzhilina, Trevor Hastie, Mark Segal","doi":"10.1214/24-AOAS1917","DOIUrl":"10.1214/24-AOAS1917","url":null,"abstract":"<p><p>Reconstructing three-dimensional (3D) chromatin structure from conformation capture assays (such as Hi-C) is a critical task in computational biology, since chromatin spatial architecture plays a vital role in numerous cellular processes and direct imaging is challenging. Most existing algorithms that operate on Hi-C contact matrices produce reconstructed 3D configurations in the form of a polygonal chain. However, none of the methods exploit the fact that the target solution is a (smooth) curve in 3D: this contiguity attribute is either ignored or indirectly addressed by imposing spatial constraints that are challenging to formulate. In this paper we develop both B-spline and smoothing spline techniques for directly capturing this potentially complex 1D curve. We subsequently combine these techniques with a Poisson model for contact counts and compare their performance on a real data example. In addition, motivated by the sparsity of Hi-C contact data, especially when obtained from single-cell assays, we appreciably extend the class of distributions used to model contact counts. We build a general distribution-based metric scaling ( <math><mi>DBMS</mi></math> ) framework from which we develop zero-inflated and Hurdle Poisson models as well as negative binomial applications. Illustrative applications make recourse to bulk Hi-C data from IMR90 cells and single-cell Hi-C data from mouse embryonic stem cells.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"2979-3006"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12209861/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144545926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UTILIZING A CAPTURE-RECAPTURE STRATEGY TO ACCELERATE INFECTIOUS DISEASE SURVEILLANCE. 利用捕获-再捕获战略加速传染病监测。
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-aoas1927
Lin Ge, Yuzi Zhang, Lance Waller, Robert Lyles

Monitoring key elements of disease dynamics (e.g., prevalence, case counts) is of great importance in infectious disease prevention and control, as emphasized during the COVID-19 pandemic. To facilitate this effort, we propose a new capture-recapture (CRC) analysis strategy that adjusts for misclassification stemming from the use of easily administered but imperfect diagnostic test kits, such as rapid antigen test-kits or saliva tests. Our method is based on a recently proposed "anchor stream" design, whereby an existing voluntary surveillance data stream is augmented by a smaller and judiciously drawn random sample. It incorporates manufacturer-specified sensitivity and specificity parameters to account for imperfect diagnostic results in one or both data streams. For inference to accompany case count estimation, we improve upon traditional Wald-type confidence intervals by developing an adapted Bayesian credible interval for the CRC estimator that yields favorable frequentist coverage properties. When feasible, the proposed design and analytic strategy provides a more efficient solution than traditional CRC methods or random sampling-based bias-corrected estimation to monitor disease prevalence while accounting for misclassification. We demonstrate the benefits of this approach through simulation studies and a numerical example that underscore its potential utility in practice for economical disease monitoring among a registered closed population.

正如2019冠状病毒病大流行期间所强调的那样,监测疾病动态的关键要素(例如流行率、病例数)对传染病预防和控制非常重要。为了促进这项工作,我们提出了一种新的捕获-再捕获(CRC)分析策略,该策略调整了由于使用易于管理但不完善的诊断测试试剂盒(如快速抗原测试试剂盒或唾液测试)而产生的错误分类。我们的方法基于最近提出的“锚流”设计,即现有的自愿监测数据流通过更小且明智地抽取随机样本来增强。它结合了制造商指定的敏感性和特异性参数,以解释一个或两个数据流中不完美的诊断结果。对于伴随病例数估计的推理,我们通过为CRC估计器开发一个自适应的贝叶斯可信区间来改进传统的wald型置信区间,该置信区间产生有利的频率覆盖特性。在可行的情况下,所提出的设计和分析策略提供了比传统CRC方法或基于随机抽样的偏差校正估计更有效的解决方案,以监测疾病流行,同时考虑错误分类。我们通过模拟研究和数值例子证明了这种方法的好处,强调了它在登记封闭人口中经济疾病监测实践中的潜在效用。
{"title":"UTILIZING A CAPTURE-RECAPTURE STRATEGY TO ACCELERATE INFECTIOUS DISEASE SURVEILLANCE.","authors":"Lin Ge, Yuzi Zhang, Lance Waller, Robert Lyles","doi":"10.1214/24-aoas1927","DOIUrl":"10.1214/24-aoas1927","url":null,"abstract":"<p><p>Monitoring key elements of disease dynamics (e.g., prevalence, case counts) is of great importance in infectious disease prevention and control, as emphasized during the COVID-19 pandemic. To facilitate this effort, we propose a new capture-recapture (CRC) analysis strategy that adjusts for misclassification stemming from the use of easily administered but imperfect diagnostic test kits, such as rapid antigen test-kits or saliva tests. Our method is based on a recently proposed \"anchor stream\" design, whereby an existing voluntary surveillance data stream is augmented by a smaller and judiciously drawn random sample. It incorporates manufacturer-specified sensitivity and specificity parameters to account for imperfect diagnostic results in one or both data streams. For inference to accompany case count estimation, we improve upon traditional Wald-type confidence intervals by developing an adapted Bayesian credible interval for the CRC estimator that yields favorable frequentist coverage properties. When feasible, the proposed design and analytic strategy provides a more efficient solution than traditional CRC methods or random sampling-based bias-corrected estimation to monitor disease prevalence while accounting for misclassification. We demonstrate the benefits of this approach through simulation studies and a numerical example that underscore its potential utility in practice for economical disease monitoring among a registered closed population.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"3130-3145"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12273866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144676401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A SPATIALLY VARYING HIERARCHICAL RANDOM EFFECTS MODEL FOR LONGITUDINAL MACULAR STRUCTURAL DATA IN GLAUCOMA PATIENTS. 青光眼患者纵向黄斑结构数据的空间变化分层随机效应模型。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-aoas1944
By Erica Su, Robert E Weiss, Kouros Nouri-Mahdavi, Andrew J Holbrook

We model longitudinal macular thickness measurements to monitor the course of glaucoma and prevent vision loss due to disease progression. The macular thickness varies over a 6 × 6 grid of locations on the retina, with additional variability arising from the imaging process at each visit. currently, ophthalmologists estimate slopes using repeated simple linear regression for each subject and location. To estimate slopes more precisely, we develop a novel Bayesian hierarchical model for multiple subjects with spatially varying population-level and subject-level coefficients, borrowing information over subjects and measurement locations. We augment the model with visit effects to account for observed spatially correlated visit-specific errors. We model spatially varying: (a) intercepts, (b) slopes, and (c) log-residual standard deviations (SD) with multivariate Gaussian process priors with Matérn cross-covariance functions. Each marginal process assumes an exponential kernel with its own SD and spatial correlation matrix. We develop our models for and apply them to data from the Advanced Glaucoma Progression Study. We show that including visit effects in the model reduces error in predicting future thickness measurements and greatly improves model fit.

我们模拟纵向黄斑厚度测量,以监测青光眼的过程和预防因疾病进展而导致的视力丧失。黄斑厚度在视网膜上的6 × 6网格上变化,每次就诊时的成像过程会产生额外的变化。目前,眼科医生使用重复的简单线性回归来估计每个受试者和位置的斜度。为了更精确地估计坡度,我们开发了一种新的贝叶斯分层模型,用于具有空间变化的人口水平和学科水平系数的多受试者,借用受试者和测量地点的信息。我们用访问效应来增加模型,以解释观察到的空间相关访问特定误差。我们对空间变化进行建模:(a)截距,(b)斜率,(c)对数残差标准差(SD),使用多变量高斯过程先验和mat交叉协方差函数。每个边缘过程假设一个指数核,具有自己的SD和空间相关矩阵。我们开发了模型,并将其应用于晚期青光眼进展研究的数据。我们表明,在模型中加入访问效应减少了预测未来厚度测量的误差,并大大改善了模型拟合。
{"title":"A SPATIALLY VARYING HIERARCHICAL RANDOM EFFECTS MODEL FOR LONGITUDINAL MACULAR STRUCTURAL DATA IN GLAUCOMA PATIENTS.","authors":"By Erica Su, Robert E Weiss, Kouros Nouri-Mahdavi, Andrew J Holbrook","doi":"10.1214/24-aoas1944","DOIUrl":"10.1214/24-aoas1944","url":null,"abstract":"<p><p>We model longitudinal macular thickness measurements to monitor the course of glaucoma and prevent vision loss due to disease progression. The macular thickness varies over a 6 × 6 grid of locations on the retina, with additional variability arising from the imaging process at each visit. currently, ophthalmologists estimate slopes using repeated simple linear regression for each subject and location. To estimate slopes more precisely, we develop a novel Bayesian hierarchical model for multiple subjects with spatially varying population-level and subject-level coefficients, borrowing information over subjects and measurement locations. We augment the model with visit effects to account for observed spatially correlated visit-specific errors. We model spatially varying: (a) intercepts, (b) slopes, and (c) log-residual standard deviations (SD) with multivariate Gaussian process priors with Matérn cross-covariance functions. Each marginal process assumes an exponential kernel with its own SD and spatial correlation matrix. We develop our models for and apply them to data from the Advanced Glaucoma Progression Study. We show that including visit effects in the model reduces error in predicting future thickness measurements and greatly improves model fit.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"3444-3466"},"PeriodicalIF":1.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11864210/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143525083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EXPOSURE EFFECTS ON COUNT OUTCOMES WITH OBSERVATIONAL DATA, WITH APPLICATION TO INCARCERATED WOMEN. 通过观察数据分析暴露对计数结果的影响,并将其应用于被监禁妇女。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/24-aoas1874
Bonnie E Shook-Sa, Michael G Hudgens, Andrea K Knittel, Andrew Edmonds, Catalina Ramirez, Stephen R Cole, Mardge Cohen, Adebola Adedimeji, Tonya Taylor, Katherine G Michel, Andrea Kovacs, Jennifer Cohen, Jessica Donohue, Antonina Foster, Margaret A Fischl, Dustin Long, Adaora A Adimora

Causal inference methods can be applied to estimate the effect of a point exposure or treatment on an outcome of interest using data from observational studies. For example, in the Women's Interagency HIV Study, it is of interest to understand the effects of incarceration on the number of sexual partners and the number of cigarettes smoked after incarceration. In settings like this where the outcome is a count, the estimand is often the causal mean ratio, i.e., the ratio of the counterfactual mean count under exposure to the counterfactual mean count under no exposure. This paper considers estimators of the causal mean ratio based on inverse probability of treatment weights, the parametric g-formula, and doubly robust estimation, each of which can account for overdispersion, zero-inflation, and heaping in the measured outcome. Methods are compared in simulations and are applied to data from the Women's Interagency HIV Study.

因果推理方法可用于利用观察性研究的数据估算点暴露或治疗对相关结果的影响。例如,在 "妇女机构间艾滋病研究"(Women's Interagency HIV Study)中,我们有兴趣了解监禁对监禁后性伴侣数量和吸烟数量的影响。在这种结果为计数的情况下,估计值通常为因果平均比率,即暴露情况下的反事实平均计数与不暴露情况下的反事实平均计数之比。本文考虑了基于逆概率处理权重、参数 g 公式和双重稳健估计的因果平均比率估计方法,每种方法都可以考虑测量结果中的过度分散、零膨胀和堆叠。通过模拟对这些方法进行了比较,并将其应用于妇女机构间艾滋病毒研究的数据中。
{"title":"EXPOSURE EFFECTS ON COUNT OUTCOMES WITH OBSERVATIONAL DATA, WITH APPLICATION TO INCARCERATED WOMEN.","authors":"Bonnie E Shook-Sa, Michael G Hudgens, Andrea K Knittel, Andrew Edmonds, Catalina Ramirez, Stephen R Cole, Mardge Cohen, Adebola Adedimeji, Tonya Taylor, Katherine G Michel, Andrea Kovacs, Jennifer Cohen, Jessica Donohue, Antonina Foster, Margaret A Fischl, Dustin Long, Adaora A Adimora","doi":"10.1214/24-aoas1874","DOIUrl":"10.1214/24-aoas1874","url":null,"abstract":"<p><p>Causal inference methods can be applied to estimate the effect of a point exposure or treatment on an outcome of interest using data from observational studies. For example, in the Women's Interagency HIV Study, it is of interest to understand the effects of incarceration on the number of sexual partners and the number of cigarettes smoked after incarceration. In settings like this where the outcome is a count, the estimand is often the causal mean ratio, i.e., the ratio of the counterfactual mean count under exposure to the counterfactual mean count under no exposure. This paper considers estimators of the causal mean ratio based on inverse probability of treatment weights, the parametric g-formula, and doubly robust estimation, each of which can account for overdispersion, zero-inflation, and heaping in the measured outcome. Methods are compared in simulations and are applied to data from the Women's Interagency HIV Study.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2147-2165"},"PeriodicalIF":1.4,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142570975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PROBABILISTIC CONTRASTIVE DIMENSION REDUCTION FOR CASE-CONTROL STUDY DATA. 病例对照研究数据的概率对比降维。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/24-aoas1877
Didong Li, Andrew Jones, Barbara Engelhardt

Case-control experiments are essential to the scientific method, as they allow researchers to test biological hypotheses by looking for differences in outcome between cases and controls. It is then of interest to characterize variation that is enriched in a "foreground" (case) dataset relative to a "background" (control) dataset. For example, in a genomics context, the goal is to identify low-dimensional transcriptional structure unique to patients with certain disease (cases) vs. those without that disease (controls). In this work we propose probabilistic contrastive principal component analysis (PCPCA), a probabilistic dimension reduction method designed for case-control data. We describe inference in PCPCA through a contrastive likelihood and show that our model generalizes PCA, probabilistic PCA, and contrastive PCA. We discuss how to set the tuning parameter in theory and in practice, and we show several of PCPCA's advantages in the analysis of case-control data over related methods, including greater interpretability, uncertainty quantification and principled inference, robustness to noise and missing data, and the ability to generate "foreground-enriched" data from the model. We demonstrate PCPCA's performance on case-control data through a series of simulations, and we successfully identify variation specific to case data in genomic case-control experiments with data modalities, including gene expression, protein expression, and images.

病例对照实验对科学方法至关重要,因为它们允许研究人员通过寻找病例和对照组之间结果的差异来测试生物学假设。然后,描述相对于“背景”(对照)数据集在“前景”(案例)数据集中丰富的变化是有意义的。例如,在基因组学背景下,目标是确定患有某种疾病的患者(病例)与没有这种疾病的患者(对照)所特有的低维转录结构。在这项工作中,我们提出了概率对比主成分分析(PCPCA),这是一种为病例对照数据设计的概率降维方法。我们通过对比似然来描述PCPCA中的推理,并表明我们的模型推广了主成分分析、概率主成分分析和对比主成分分析。我们讨论了如何在理论和实践中设置调优参数,并展示了PCPCA在病例对照数据分析中的几个优势,包括更好的可解释性,不确定性量化和原则推理,对噪声和缺失数据的鲁棒性,以及从模型中生成“前景丰富”数据的能力。我们通过一系列的模拟证明了PCPCA在病例对照数据上的表现,并且我们成功地识别了基因组病例对照实验中特定于病例数据的变异,数据模式包括基因表达、蛋白质表达和图像。
{"title":"PROBABILISTIC CONTRASTIVE DIMENSION REDUCTION FOR CASE-CONTROL STUDY DATA.","authors":"Didong Li, Andrew Jones, Barbara Engelhardt","doi":"10.1214/24-aoas1877","DOIUrl":"10.1214/24-aoas1877","url":null,"abstract":"<p><p>Case-control experiments are essential to the scientific method, as they allow researchers to test biological hypotheses by looking for differences in outcome between cases and controls. It is then of interest to characterize variation that is enriched in a \"foreground\" (case) dataset relative to a \"background\" (control) dataset. For example, in a genomics context, the goal is to identify low-dimensional transcriptional structure unique to patients with certain disease (cases) vs. those without that disease (controls). In this work we propose probabilistic contrastive principal component analysis (PCPCA), a probabilistic dimension reduction method designed for case-control data. We describe inference in PCPCA through a contrastive likelihood and show that our model generalizes PCA, probabilistic PCA, and contrastive PCA. We discuss how to set the tuning parameter in theory and in practice, and we show several of PCPCA's advantages in the analysis of case-control data over related methods, including greater interpretability, uncertainty quantification and principled inference, robustness to noise and missing data, and the ability to generate \"foreground-enriched\" data from the model. We demonstrate PCPCA's performance on case-control data through a series of simulations, and we successfully identify variation specific to case data in genomic case-control experiments with data modalities, including gene expression, protein expression, and images.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2207-2229"},"PeriodicalIF":1.4,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12700624/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SEMIPARAMETRIC LINEAR REGRESSION WITH AN INTERVAL-CENSORED COVARIATE IN THE ATHEROSCLEROSIS RISK IN COMMUNITIES STUDY. 社区动脉粥样硬化风险研究中的半参数线性回归与区间截除协变量。
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-09-01 DOI: 10.1214/24-aoas1881
Richard Sizelove, Donglin Zeng, Dan-Yu Lin

In longitudinal studies, investigators are often interested in understanding how the time since the occurrence of an intermediate event affects a future outcome. The intermediate event is often asymptomatic such that its occurrence is only known to lie in a time interval induced by periodic examinations. We propose a linear regression model that relates the time since the occurrence of the intermediate event to a continuous response at a future time point through a rectified linear unit activation function while formulating the distribution of the time to the occurrence of the intermediate event through the Cox proportional hazards model. We consider nonparametric maximum likelihood estimation with an arbitrary sequence of examination times for each subject. We present an EM algorithm that converges stably for arbitrary datasets. The resulting estimators of regression parameters are consistent, asymptotically normal, and asymptotically efficient. We assess the performance of the proposed methods through extensive simulation studies and provide an application to the Atherosclerosis Risk in Communities Study.

在纵向研究中,研究者通常感兴趣的是了解中间事件发生后的时间如何影响未来的结果。中间事件通常是无症状的,因此它的发生只在定期检查引起的时间间隔内才知道。我们提出了一个线性回归模型,通过修正的线性单元激活函数将中间事件发生以来的时间与未来时间点的连续响应联系起来,同时通过Cox比例风险模型制定中间事件发生的时间分布。我们考虑每个科目的任意考试时间序列的非参数最大似然估计。提出了一种对任意数据集稳定收敛的EM算法。所得到的回归参数估计量是一致的、渐近正态的和渐近有效的。我们通过广泛的模拟研究评估了所提出方法的性能,并为社区动脉粥样硬化风险研究提供了应用。
{"title":"SEMIPARAMETRIC LINEAR REGRESSION WITH AN INTERVAL-CENSORED COVARIATE IN THE ATHEROSCLEROSIS RISK IN COMMUNITIES STUDY.","authors":"Richard Sizelove, Donglin Zeng, Dan-Yu Lin","doi":"10.1214/24-aoas1881","DOIUrl":"10.1214/24-aoas1881","url":null,"abstract":"<p><p>In longitudinal studies, investigators are often interested in understanding how the time since the occurrence of an intermediate event affects a future outcome. The intermediate event is often asymptomatic such that its occurrence is only known to lie in a time interval induced by periodic examinations. We propose a linear regression model that relates the time since the occurrence of the intermediate event to a continuous response at a future time point through a rectified linear unit activation function while formulating the distribution of the time to the occurrence of the intermediate event through the Cox proportional hazards model. We consider nonparametric maximum likelihood estimation with an arbitrary sequence of examination times for each subject. We present an EM algorithm that converges stably for arbitrary datasets. The resulting estimators of regression parameters are consistent, asymptotically normal, and asymptotically efficient. We assess the performance of the proposed methods through extensive simulation studies and provide an application to the Atherosclerosis Risk in Communities Study.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2295-2306"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12272158/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144676400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annals of Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1