首页 > 最新文献

Annals of Applied Statistics最新文献

英文 中文
INTEGRATIVE ECOLOGICAL REGRESSION ANALYSIS OF U.S. COUNTY AND STATE LEVEL COVID-19 DEATH DATA FOR STUDYING HEALTH DISPARITY ASSOCIATIONS. 美国县和州一级COVID-19死亡数据的综合生态回归分析研究健康差异关联
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/25-aoas2055
Daniel Li, Xihong Lin

It is of substantial interest to study health disparity associations with COVID-19 death rates. Although high-quality individual-level COVID-19 epidemiological data have been difficult to collect on a national scale, all United States (U.S.) counties have reported total COVID-19 death counts. A standard ecological analysis would then regress county total death counts by county-level covariates such as age, sex, and race percentages. However, such an analysis is limited by ecological bias and fallacy in which estimated county-level associations are different from individual-level associations. Fortunately, state-level age, sex, and race specific COVID-19 death counts are also available for all U.S. states, so this information can be integrated with county-level data for more informative ecological analyses. We propose an approximate log-linear random effects model to jointly model county-level total death counts and state-level age, sex, and race specific death counts. We then develop a penalized composite log-likelihood method for parameter estimation and perform simulation studies to evaluate our proposed approach. Lastly, we analyze COVID-19 death data from the entire U.S., show how incorporating state-level counts can prevent ecological bias and fallacy, and illustrate the heterogeneity in health disparity associations across different U.S. states.

研究健康差异与COVID-19死亡率之间的关系具有重大意义。尽管难以在全国范围内收集高质量的个人层面的COVID-19流行病学数据,但美国所有县都报告了COVID-19总死亡人数。然后,标准的生态分析将按县级协变量(如年龄、性别和种族百分比)对县总死亡人数进行回归。然而,这种分析受到生态偏差和谬误的限制,其中估计的县级关联不同于个人层面的关联。幸运的是,美国所有州也有州一级的年龄、性别和种族特定的COVID-19死亡人数,因此这些信息可以与县级数据相结合,以进行更有信息的生态分析。我们提出了一个近似对数线性随机效应模型来联合模拟县级总死亡人数和州级年龄、性别和种族特定死亡人数。然后,我们开发了一种惩罚复合对数似然方法用于参数估计,并进行模拟研究来评估我们提出的方法。最后,我们分析了来自整个美国的COVID-19死亡数据,展示了纳入州一级计数如何防止生态偏差和谬误,并说明了美国不同州之间健康差异关联的异质性。
{"title":"INTEGRATIVE ECOLOGICAL REGRESSION ANALYSIS OF U.S. COUNTY AND STATE LEVEL COVID-19 DEATH DATA FOR STUDYING HEALTH DISPARITY ASSOCIATIONS.","authors":"Daniel Li, Xihong Lin","doi":"10.1214/25-aoas2055","DOIUrl":"10.1214/25-aoas2055","url":null,"abstract":"<p><p>It is of substantial interest to study health disparity associations with COVID-19 death rates. Although high-quality individual-level COVID-19 epidemiological data have been difficult to collect on a national scale, all United States (U.S.) counties have reported total COVID-19 death counts. A standard ecological analysis would then regress county total death counts by county-level covariates such as age, sex, and race percentages. However, such an analysis is limited by ecological bias and fallacy in which estimated county-level associations are different from individual-level associations. Fortunately, state-level age, sex, and race specific COVID-19 death counts are also available for all U.S. states, so this information can be integrated with county-level data for more informative ecological analyses. We propose an approximate log-linear random effects model to jointly model county-level total death counts and state-level age, sex, and race specific death counts. We then develop a penalized composite log-likelihood method for parameter estimation and perform simulation studies to evaluate our proposed approach. Lastly, we analyze COVID-19 death data from the entire U.S., show how incorporating state-level counts can prevent ecological bias and fallacy, and illustrate the heterogeneity in health disparity associations across different U.S. states.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"2320-2338"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12900166/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAYESIAN LEARNING OF CLINICALLY MEANINGFUL SEPSIS PHENOTYPES IN NORTHERN TANZANIA. 坦桑尼亚北部临床意义的败血症表型的贝叶斯学习。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/25-aoas2045
Alexander Dombowsky, David B Dunson, Deng B Madut, Matthew P Rubach, Amy H Herring

Sepsis is a life-threatening condition caused by a dysregulated host response to infection. Recently, researchers have hypothesized that sepsis consists of a heterogeneous spectrum of distinct subtypes, motivating several studies to identify clusters of sepsis patients that correspond to subtypes, with the long-term goal of using these clusters to design subtype-specific treatments. Therefore, clinicians rely on clusters having a concrete medical interpretation, usually corresponding to clinically meaningful regions of the sample space that have a concrete implication to practitioners. In this article, we propose Clustering Around Meaningful Regions (CLAMR), a Bayesian clustering approach that explicitly models the medical interpretation of each cluster center. CLAMR favors clusterings that can be summarized via meaningful feature values, leading to medically significant sepsis patient clusters. We also provide details on measuring the effect of each feature on the clustering using Bayesian hypothesis tests, so one can assess what features are relevant for cluster interpretation. Our focus is on clustering sepsis patients from Moshi, Tanzania, where patients are younger and the prevalence of HIV infection is higher than in previous sepsis subtyping cohorts.

败血症是一种危及生命的疾病,由宿主对感染的反应失调引起。最近,研究人员假设脓毒症由不同亚型的异质谱组成,这促使一些研究确定与亚型相对应的脓毒症患者群,并利用这些群设计亚型特异性治疗的长期目标。因此,临床医生依赖具有具体医学解释的聚类,通常对应于对从业者具有具体含义的样本空间中有临床意义的区域。在本文中,我们提出了围绕有意义区域的聚类(CLAMR),这是一种贝叶斯聚类方法,它明确地模拟了每个聚类中心的医学解释。CLAMR倾向于可以通过有意义的特征值进行总结的聚类,从而导致具有医学意义的脓毒症患者聚类。我们还提供了使用贝叶斯假设检验测量每个特征对聚类的影响的详细信息,因此可以评估哪些特征与聚类解释相关。我们的重点是来自坦桑尼亚Moshi的聚类脓毒症患者,那里的患者更年轻,HIV感染的流行率高于以前的脓毒症亚型队列。
{"title":"BAYESIAN LEARNING OF CLINICALLY MEANINGFUL SEPSIS PHENOTYPES IN NORTHERN TANZANIA.","authors":"Alexander Dombowsky, David B Dunson, Deng B Madut, Matthew P Rubach, Amy H Herring","doi":"10.1214/25-aoas2045","DOIUrl":"10.1214/25-aoas2045","url":null,"abstract":"<p><p>Sepsis is a life-threatening condition caused by a dysregulated host response to infection. Recently, researchers have hypothesized that sepsis consists of a heterogeneous spectrum of distinct subtypes, motivating several studies to identify clusters of sepsis patients that correspond to subtypes, with the long-term goal of using these clusters to design subtype-specific treatments. Therefore, clinicians rely on clusters having a concrete medical interpretation, usually corresponding to clinically meaningful regions of the sample space that have a concrete implication to practitioners. In this article, we propose Clustering Around Meaningful Regions (CLAMR), a Bayesian clustering approach that explicitly models the medical interpretation of each cluster center. CLAMR favors clusterings that can be summarized via meaningful feature values, leading to medically significant sepsis patient clusters. We also provide details on measuring the effect of each feature on the clustering using Bayesian hypothesis tests, so one can assess what features are relevant for cluster interpretation. Our focus is on clustering sepsis patients from Moshi, Tanzania, where patients are younger and the prevalence of HIV infection is higher than in previous sepsis subtyping cohorts.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"2193-2217"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12422288/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145042065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAYESIAN DIFFERENTIAL CAUSAL DIRECTED ACYCLIC GRAPHS FOR OBSERVATIONAL ZERO-INFLATED COUNTS WITH AN APPLICATION TO TWO-SAMPLE SINGLE-CELL DATA. 观测零膨胀计数的贝叶斯微分因果有向无环图及其在双样本单细胞数据中的应用。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/25-aoas2042
Junsouk Choi, Robert S Chapkin, Yang Ni

Observational zero-inflated count data arise in a wide range of areas such as genomics. One of the common research questions is to identify causal relationships by learning the structure of a sparse directed acyclic graph (DAG). While structure learning of DAGs has been an active research area, existing methods do not adequately account for excessive zeros and therefore are not suitable for modeling zero-inflated count data. Moreover, it is often interesting to study differences in the causal networks for data collected from two experimental groups (control vs treatment). To explicitly account for zero-inflation and identify differential causal networks, we propose a novel Bayesian differential zero-inflated negative binomial DAG (DAG0) model. We prove that the causal relationships under the proposed DAG0 are fully identifiable from purely observational, cross-sectional data, using a general proof technique that is applicable beyond the proposed model. Bayesian inference based on parallel-tempered Markov chain Monte Carlo is developed to efficiently explore the multi-modal posterior landscape. We demonstrate the utility of the proposed DAG0 by comparing it with state-of-the-art alternative methods through extensive simulations. An application in a single-cell RNA-sequencing dataset generated under two experimental groups finds some interesting results that appear to be consistent with existing knowledge. A user-friendly R package that implements DAG0 is available at https://github.com/junsoukchoi/BayesDAG0.git.

观测零膨胀计数数据出现在广泛的领域,如基因组学。一个常见的研究问题是通过学习稀疏有向无环图(DAG)的结构来识别因果关系。虽然dag的结构学习一直是一个活跃的研究领域,但现有的方法不能充分考虑过多的零,因此不适合建模零膨胀计数数据。此外,研究从两个实验组(对照组与实验组)收集的数据的因果网络差异通常是有趣的。为了明确地解释零膨胀和识别差分因果网络,我们提出了一个新的贝叶斯微分零膨胀负二项DAG (DAG0)模型。我们使用一种适用于所提出模型之外的一般证明技术,证明了所提出的DAG0下的因果关系完全可以从纯粹的观察性横截面数据中识别出来。为了有效地探索多模态后验景观,提出了基于并行调节马尔可夫链蒙特卡罗的贝叶斯推理方法。我们通过广泛的模拟将所提出的DAG0与最先进的替代方法进行比较,从而证明了它的实用性。在两个实验组生成的单细胞rna测序数据集中的应用发现了一些有趣的结果,这些结果似乎与现有知识一致。一个实现DAG0的用户友好的R包可以在https://github.com/junsoukchoi/BayesDAG0.git上获得。
{"title":"BAYESIAN DIFFERENTIAL CAUSAL DIRECTED ACYCLIC GRAPHS FOR OBSERVATIONAL ZERO-INFLATED COUNTS WITH AN APPLICATION TO TWO-SAMPLE SINGLE-CELL DATA.","authors":"Junsouk Choi, Robert S Chapkin, Yang Ni","doi":"10.1214/25-aoas2042","DOIUrl":"10.1214/25-aoas2042","url":null,"abstract":"<p><p>Observational zero-inflated count data arise in a wide range of areas such as genomics. One of the common research questions is to identify causal relationships by learning the structure of a sparse directed acyclic graph (DAG). While structure learning of DAGs has been an active research area, existing methods do not adequately account for excessive zeros and therefore are not suitable for modeling zero-inflated count data. Moreover, it is often interesting to study differences in the causal networks for data collected from two experimental groups (control vs treatment). To explicitly account for zero-inflation and identify differential causal networks, we propose a novel Bayesian differential zero-inflated negative binomial DAG (DAG0) model. We prove that the causal relationships under the proposed DAG0 are fully identifiable from purely observational, cross-sectional data, using a general proof technique that is applicable beyond the proposed model. Bayesian inference based on parallel-tempered Markov chain Monte Carlo is developed to efficiently explore the multi-modal posterior landscape. We demonstrate the utility of the proposed DAG0 by comparing it with state-of-the-art alternative methods through extensive simulations. An application in a single-cell RNA-sequencing dataset generated under two experimental groups finds some interesting results that appear to be consistent with existing knowledge. A user-friendly R package that implements DAG0 is available at https://github.com/junsoukchoi/BayesDAG0.git.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"1908-1930"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395422/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AVERAGED PREDICTION MODELS (APM): IDENTIFYING CAUSAL EFFECTS IN CONTROLLED PRE-POST SETTINGS WITH APPLICATION TO GUN POLICY. 平均预测模型(apm):识别控制前后设置的因果关系,并应用于枪支政策。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/25-aoas2011
Thomas Leavitt, Laura A Hatfield

To investigate causal impacts, many researchers use controlled pre-post designs that compare over-time differences between a population exposed to a policy change and an unexposed comparison group. However, researchers using these designs often disagree about the "correct" specification of the causal model, perhaps most notably in analyses to identify the effects of gun policies on crime. To help settle these model specification debates, we propose a general identification framework that unifies a variety of models researchers use in practice. In this framework, which nests "brand name" designs like Difference-in-Differences as special cases, we use models to predict untreated outcomes and then correct the treated group's predictions using the comparison group's observed prediction errors. Our point identifying assumption is that treated and comparison groups would have equal prediction errors (in expectation) under no treatment. To choose among candidate models, we propose a data-driven procedure based on models' robustness to violations of this point identifying assumption. Our selection procedure averages over candidate models, weighting by each model's posterior probability of being the most robust given its differential average prediction errors in the pre-period. This approach offers a way out of debates over the "correct" model by choosing on robustness instead and has the desirable property of being feasible in the "locked box" of pre-intervention data only. We apply our methodology to the gun policy debate, focusing specifically on Missouri's 2007 repeal of its permit-to-purchase law, and provide an R package (apm) for implementation.

为了调查因果影响,许多研究人员使用受控的前后设计,比较暴露于政策变化的人群和未暴露的对照组之间的长期差异。然而,使用这些设计的研究人员经常对因果模型的“正确”说明持不同意见,也许最明显的是在确定枪支政策对犯罪的影响的分析中。为了帮助解决这些模型规范的争论,我们提出了一个通用的识别框架,它统一了研究人员在实践中使用的各种模型。在这个框架中,我们使用模型来预测未治疗组的结果,然后使用对照组观察到的预测误差来纠正治疗组的预测。我们的观点识别假设是,治疗组和对照组在没有治疗的情况下会有相同的预测误差(在期望中)。为了在候选模型中进行选择,我们提出了一个基于模型对违反这一点识别假设的鲁棒性的数据驱动程序。我们的选择过程对候选模型进行平均,根据每个模型的后验概率进行加权,给定其在前期的微分平均预测误差。这种方法提供了一种方法,通过选择鲁棒性来解决关于“正确”模型的争论,并且具有仅在干预前数据的“锁定框”中可行的理想特性。我们将我们的方法应用于枪支政策辩论,特别关注密苏里州2007年废除其购买许可法,并提供一个R包(apm)用于实施。
{"title":"AVERAGED PREDICTION MODELS (APM): IDENTIFYING CAUSAL EFFECTS IN CONTROLLED PRE-POST SETTINGS WITH APPLICATION TO GUN POLICY.","authors":"Thomas Leavitt, Laura A Hatfield","doi":"10.1214/25-aoas2011","DOIUrl":"10.1214/25-aoas2011","url":null,"abstract":"<p><p>To investigate causal impacts, many researchers use controlled pre-post designs that compare over-time differences between a population exposed to a policy change and an unexposed comparison group. However, researchers using these designs often disagree about the \"correct\" specification of the causal model, perhaps most notably in analyses to identify the effects of gun policies on crime. To help settle these model specification debates, we propose a general identification framework that unifies a variety of models researchers use in practice. In this framework, which nests \"brand name\" designs like Difference-in-Differences as special cases, we use models to predict untreated outcomes and then correct the treated group's predictions using the comparison group's observed prediction errors. Our point identifying assumption is that treated and comparison groups would have equal prediction errors (in expectation) under no treatment. To choose among candidate models, we propose a data-driven procedure based on models' robustness to violations of this point identifying assumption. Our selection procedure averages over candidate models, weighting by each model's posterior probability of being the most robust given its differential average prediction errors in the pre-period. This approach offers a way out of debates over the \"correct\" model by choosing on robustness instead and has the desirable property of being feasible in the \"locked box\" of pre-intervention data only. We apply our methodology to the gun policy debate, focusing specifically on Missouri's 2007 repeal of its permit-to-purchase law, and provide an R package (apm) for implementation.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"1826-1846"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12633725/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SURROGATE SELECTION OVERSAMPLES EXPANDED T CELL CLONOTYPES. 选择扩增的t细胞克隆型。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/25-aoas2032
Peng Yu, Yumin Lian, Elliot Xie, Cindy L Zuleger, Richard J Albertini, Mark R Albertini, Michael A Newton

Surrogate selection is an experimental design that without sequencing any DNA can restrict a sample of cells to those carrying certain genomic mutations. In immunological disease studies, this design may provide a relatively easy approach to enrich a lymphocyte sample with cells relevant to the disease response because the emergence of neutral mutations associates with the proliferation history of clonal subpopulations. A statistical analysis of clonotype sizes provides a structured, quantitative perspective on this useful property of surrogate selection. Our model specification couples within-clonotype birth-death processes with an exchangeable model across clonotypes. Beyond enrichment questions about the surrogate selection design, our framework enables a study of sampling properties of elementary sample diversity statistics; it also points to new statistics that may usefully measure the burden of somatic genomic alterations associated with clonal expansion. We examine statistical properties of immunological samples governed by the coupled model specification, and we illustrate calculations in surrogate selection studies of melanoma and in single-cell genomic studies of T cell repertoires.

替代选择是一种实验设计,不需要对任何DNA进行测序,就可以将细胞样本限制在携带某些基因组突变的细胞中。在免疫学疾病研究中,由于中性突变的出现与克隆亚群的增殖历史相关,这种设计可能提供了一种相对简单的方法来丰富与疾病反应相关的淋巴细胞样本。克隆型大小的统计分析提供了一个结构化的,定量的角度对这一有用的属性选择代孕。我们的模型规范在克隆型出生-死亡过程中与跨克隆型的可交换模型耦合。除了关于代理选择设计的丰富问题之外,我们的框架还可以研究基本样本多样性统计的抽样特性;它还指出了新的统计数据,可以有效地测量与克隆扩增相关的体细胞基因组改变的负担。我们研究了由耦合模型规范控制的免疫样本的统计特性,并说明了黑色素瘤的替代选择研究和T细胞谱的单细胞基因组研究中的计算。
{"title":"SURROGATE SELECTION OVERSAMPLES EXPANDED T CELL CLONOTYPES.","authors":"Peng Yu, Yumin Lian, Elliot Xie, Cindy L Zuleger, Richard J Albertini, Mark R Albertini, Michael A Newton","doi":"10.1214/25-aoas2032","DOIUrl":"10.1214/25-aoas2032","url":null,"abstract":"<p><p>Surrogate selection is an experimental design that without sequencing any DNA can restrict a sample of cells to those carrying certain genomic mutations. In immunological disease studies, this design may provide a relatively easy approach to enrich a lymphocyte sample with cells relevant to the disease response because the emergence of neutral mutations associates with the proliferation history of clonal subpopulations. A statistical analysis of clonotype sizes provides a structured, quantitative perspective on this useful property of surrogate selection. Our model specification couples within-clonotype birth-death processes with an exchangeable model across clonotypes. Beyond enrichment questions about the surrogate selection design, our framework enables a study of sampling properties of elementary sample diversity statistics; it also points to new statistics that may usefully measure the burden of somatic genomic alterations associated with clonal expansion. We examine statistical properties of immunological samples governed by the coupled model specification, and we illustrate calculations in surrogate selection studies of melanoma and in single-cell genomic studies of T cell repertoires.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"1884-1907"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12481847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145208467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CONTRASTIVE LINEAR REGRESSION. 对比线性回归。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/24-aoas1977
Boyang Zhang, Sarah Nyquist, Andrew Jones, Barbara E Engelhardt, Didong Li

Contrastive dimension reduction methods have been developed for case-control study data to identify variation that is enriched in the foreground (case) data X relative to the background (control) data Y . Here we develop contrastive regression for the setting where there is a response variable r associated with each foreground observation. This situation occurs frequently when, for example, the unaffected controls do not have a disease grade or intervention dosage, but the affected cases have a disease grade or intervention dosage, as in autism severity, solid tumors stages, polyp sizes, or warfarin dosages. Our contrastive regression model captures shared low-dimensional variation between the predictors in the case and control groups and then explains the case-specific response variables through the variance that remains in the predictors after shared variation is removed. We show that, in one single-cell RNA sequencing dataset on cellular differentiation in chronic rhinosinusitis with and without nasal polyps and in another single-nucleus RNA sequencing dataset on autism severity in postmortem brain samples from donors with and without autism, our contrastive linear regression performs feature ranking and identifies biologically-informative predictors associated with response that cannot be identified using other approaches.

针对病例对照研究数据,已经开发了对比降维方法,以识别前景(病例)数据X相对于背景(对照)数据Y中丰富的变化。在这里,我们开发了对比回归的设置,其中有一个响应变量r与每个前景观测相关联。这种情况经常发生,例如,未受影响的对照组没有疾病等级或干预剂量,但受影响的病例有疾病等级或干预剂量,如自闭症严重程度、实体瘤分期、息肉大小或华法林剂量。我们的对比回归模型捕获了病例组和对照组中预测因子之间共有的低维变异,然后通过去除共有变异后预测因子中保留的变异来解释特定病例的响应变量。我们的研究表明,在一个单细胞RNA测序数据集中,慢性鼻窦炎伴鼻息肉和不伴鼻息肉的细胞分化,以及在另一个单细胞RNA测序数据集中,来自有或没有自闭症的捐赠者的死后脑样本中自闭症严重程度,我们的对比线性回归进行了特征排序,并确定了与反应相关的生物学信息预测因子,这些预测因子无法用其他方法识别。
{"title":"CONTRASTIVE LINEAR REGRESSION.","authors":"Boyang Zhang, Sarah Nyquist, Andrew Jones, Barbara E Engelhardt, Didong Li","doi":"10.1214/24-aoas1977","DOIUrl":"10.1214/24-aoas1977","url":null,"abstract":"<p><p>Contrastive dimension reduction methods have been developed for case-control study data to identify variation that is enriched in the foreground (case) data <math><mi>X</mi></math> relative to the background (control) data <math><mi>Y</mi></math> . Here we develop contrastive regression for the setting where there is a response variable <math><mi>r</mi></math> associated with each foreground observation. This situation occurs frequently when, for example, the unaffected controls do not have a disease grade or intervention dosage, but the affected cases have a disease grade or intervention dosage, as in autism severity, solid tumors stages, polyp sizes, or warfarin dosages. Our contrastive regression model captures shared low-dimensional variation between the predictors in the case and control groups and then explains the case-specific response variables through the variance that remains in the predictors after shared variation is removed. We show that, in one single-cell RNA sequencing dataset on cellular differentiation in chronic rhinosinusitis with and without nasal polyps and in another single-nucleus RNA sequencing dataset on autism severity in postmortem brain samples from donors with and without autism, our contrastive linear regression performs feature ranking and identifies biologically-informative predictors associated with response that cannot be identified using other approaches.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"1868-1883"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12692120/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NETWORK-BASED MODELING OF EMOTIONAL EXPRESSIONS FOR MULTIPLE CANCERS VIA A LINGUISTIC ANALYSIS OF AN ONLINE HEALTH COMMUNITY. 通过对在线健康社区的语言分析,对多种癌症的情感表达进行基于网络的建模。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/25-aoas2047
Xinyan Fan, Mengque Liu, Shuangge Ma

The diagnosis and treatment of cancer can evoke a variety of adverse emotions. Online health communities (OHCs) provide a safe platform for cancer patients and those closely related to express emotions without fear of judgement or stigma. In the literature, linguistic analysis of OHCs is usually limited to a single disease and based on methods with various technical limitations. In this article we analyze posts from September 2003 to September 2022 on eight cancers that are publicly available at the American Cancer Society's Cancer Survivors Network (CSN). We propose a novel network analysis technique based on low-rank matrices. The proposed approach decomposes the emotional expression semantic networks into an across-cancer time-independent component (which describes the "baseline" that is shared by multiple cancers), a cancer-specific time-independent component (which describes cancer-specific properties), and an across-cancer time-dependent component (which accommodates temporal effects on multiple cancer communities). For the second and third components, respectively, we consider a novel clustering structure and a change point structure. A penalization approach is developed, and its theoretical and computational properties are carefully established. The analysis of the CSN data leads to sensible networks and deeper insights into emotions for cancer overall and specific cancer types.

癌症的诊断和治疗可引起各种不良情绪。在线卫生社区(OHCs)为癌症患者和那些与表达情绪密切相关的人提供了一个安全的平台,而不必担心被评判或污名化。在文献中,OHCs的语言分析通常仅限于单一疾病,并且基于具有各种技术限制的方法。在这篇文章中,我们分析了从2003年9月到2022年9月在美国癌症协会癌症幸存者网络(CSN)上公开的八种癌症的帖子。提出了一种基于低秩矩阵的网络分析方法。提出的方法将情感表达语义网络分解为跨癌症时间独立组件(描述多种癌症共享的“基线”),癌症特异性时间独立组件(描述癌症特异性属性)和跨癌症时间依赖组件(适应对多种癌症社区的时间影响)。对于第二部分和第三部分,我们分别考虑了一种新的聚类结构和变化点结构。提出了一种惩罚方法,并详细建立了其理论和计算性质。通过对CSN数据的分析,我们可以建立合理的网络,并更深入地了解癌症整体和特定癌症类型的情绪。
{"title":"NETWORK-BASED MODELING OF EMOTIONAL EXPRESSIONS FOR MULTIPLE CANCERS VIA A LINGUISTIC ANALYSIS OF AN ONLINE HEALTH COMMUNITY.","authors":"Xinyan Fan, Mengque Liu, Shuangge Ma","doi":"10.1214/25-aoas2047","DOIUrl":"10.1214/25-aoas2047","url":null,"abstract":"<p><p>The diagnosis and treatment of cancer can evoke a variety of adverse emotions. Online health communities (OHCs) provide a safe platform for cancer patients and those closely related to express emotions without fear of judgement or stigma. In the literature, linguistic analysis of OHCs is usually limited to a single disease and based on methods with various technical limitations. In this article we analyze posts from September 2003 to September 2022 on eight cancers that are publicly available at the American Cancer Society's Cancer Survivors Network (CSN). We propose a novel network analysis technique based on low-rank matrices. The proposed approach decomposes the emotional expression semantic networks into an across-cancer time-independent component (which describes the \"baseline\" that is shared by multiple cancers), a cancer-specific time-independent component (which describes cancer-specific properties), and an across-cancer time-dependent component (which accommodates temporal effects on multiple cancer communities). For the second and third components, respectively, we consider a novel clustering structure and a change point structure. A penalization approach is developed, and its theoretical and computational properties are carefully established. The analysis of the CSN data leads to sensible networks and deeper insights into emotions for cancer overall and specific cancer types.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"2218-2236"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12525517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145309914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KULLBACK-LEIBLER-BASED DISCRETE FAILURE TIME MODELS FOR INTEGRATION OF PUBLISHED PREDICTION MODELS WITH NEW TIME-TO-EVENT DATASET. 基于kullback - leibler的离散故障时间模型,用于集成已发布的预测模型和新的时间到事件数据集。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-06-01 Epub Date: 2025-05-28 DOI: 10.1214/24-aoas1955
Di Wang, Wen Ye, Randall Sung, Hui Jiang, Jeremy M G Taylor, Lisa Ly, Kevin He

Prediction of time-to-event data often suffers from rare event rates, small sample sizes, high dimensionality, and low signal-to-noise ratios. Incorporating published prediction models from external large-scale studies is expected to improve the performance of prognosis prediction from internal individual-level data. However, existing integration approaches typically assume that the underlying distributions of the external and internal data sources are similar, which is often invalid. To account for challenges, including heterogeneity, data sharing, and privacy constraints, we propose a failure time integration procedure, which utilizes a discrete hazard-based Kullback-Leibler discriminatory information measuring the discrepancy between the external models and the internal dataset. The asymptotic properties and simulation results show the advantage of the proposed method compared to those solely based on internal data. We apply the proposed method to improve prediction performance on a kidney transplant dataset from a local hospital by integrating this small-sized dataset with a published survival model obtained from the national transplant registry.

时间到事件数据的预测通常受到罕见事件率、小样本量、高维和低信噪比的影响。纳入来自外部大规模研究的已发表的预测模型有望提高来自内部个人水平数据的预后预测的性能。但是,现有的集成方法通常假设外部和内部数据源的底层分布相似,这通常是无效的。为了解决包括异质性、数据共享和隐私约束在内的挑战,我们提出了一种故障时间集成程序,该程序利用基于离散风险的Kullback-Leibler歧视性信息来测量外部模型与内部数据集之间的差异。渐近特性和仿真结果表明了该方法相对于仅基于内部数据的方法的优越性。我们将该方法应用于来自当地医院的肾移植数据集,通过将该小型数据集与从国家移植登记处获得的已发布的生存模型集成在一起,提高了该数据集的预测性能。
{"title":"KULLBACK-LEIBLER-BASED DISCRETE FAILURE TIME MODELS FOR INTEGRATION OF PUBLISHED PREDICTION MODELS WITH NEW TIME-TO-EVENT DATASET.","authors":"Di Wang, Wen Ye, Randall Sung, Hui Jiang, Jeremy M G Taylor, Lisa Ly, Kevin He","doi":"10.1214/24-aoas1955","DOIUrl":"10.1214/24-aoas1955","url":null,"abstract":"<p><p>Prediction of time-to-event data often suffers from rare event rates, small sample sizes, high dimensionality, and low signal-to-noise ratios. Incorporating published prediction models from external large-scale studies is expected to improve the performance of prognosis prediction from internal individual-level data. However, existing integration approaches typically assume that the underlying distributions of the external and internal data sources are similar, which is often invalid. To account for challenges, including heterogeneity, data sharing, and privacy constraints, we propose a failure time integration procedure, which utilizes a discrete hazard-based Kullback-Leibler discriminatory information measuring the discrepancy between the external models and the internal dataset. The asymptotic properties and simulation results show the advantage of the proposed method compared to those solely based on internal data. We apply the proposed method to improve prediction performance on a kidney transplant dataset from a local hospital by integrating this small-sized dataset with a published survival model obtained from the national transplant registry.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 2","pages":"1167-1189"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12797872/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A DEEP NEURAL NETWORK TWO-PART MODEL AND FEATURE IMPORTANCE TEST FOR SEMI-CONTINUOUS DATA. 半连续数据的深度神经网络两部分模型及特征重要性检验。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-06-01 Epub Date: 2025-05-28 DOI: 10.1214/25-aoas2013
Baiming Zou, Xinlei Mi, Shiyu Wan, Di Wu, James G Xenakis, Jianhua Hu, Fei Zou

Semi-continuous data frequently arise in clinical practice. For example, while many surgical patients still suffer from varying degrees of acute postoperative pain (POP) sometime after surgery (i.e., POP score > 0), others experience none (i.e., POP score = 0), indicating the existence of two distinct data processes at play. Existing parametric or semi-parametric two-part modeling methods for this type of semi-continuous data can fail to appropriately model the two underlying data processes as such methods rely heavily on (generalized) linear additive assumptions. However, many factors may interact to jointly influence the experience of POP non-additively and non-linearly. Motivated by this challenge and inspired by the flexibility of deep neural networks (DNN) to accurately approximate complex functions universally, we derive a DNN-based two-part model by adapting the conventional DNN methods with two additional components: a bootstrapping procedure along with a filtering algorithm to boost the stability of the conventional DNN, an approach we denote as sDNN. To improve the interpretability and transparency of sDNN, we further derive a feature importance testing procedure to identify important features associated with the outcome measurements of the two data processes, denoting this approach fsDNN. We show that fsDNN not only offers a statistical inference procedure for each feature under complex association but also that using the identified features can further improve the predictive performance of sDNN. The proposed sDNN- and fsDNN-based two-part models are applied to the analysis of real data from a POP study, in which application they clearly demonstrate advantages over the existing parametric and semi-parametric two-part models. Further, we conduct extensive numerical studies and draw comparisons with other machine learning methods to demonstrate that sDNN and fsDNN consistently outperform the existing two-part models and frequently used machine learning methods regardless of the data complexity. An R package implementing the proposed methods has been developed and is available in the Supplementary Material (Zou et al, 2025) and is also deposited on GitHub (https://github.com/BZou-lab/fsDNN).

临床实践中经常出现半连续数据。例如,虽然许多手术患者在手术后一段时间仍然遭受不同程度的急性术后疼痛(POP)(即POP评分> 0),但其他人则没有(即POP评分= 0),这表明存在两种不同的数据过程在起作用。对于这类半连续数据,现有的参数或半参数两部分建模方法可能无法适当地对两个潜在的数据过程进行建模,因为这些方法严重依赖于(广义的)线性可加性假设。然而,许多因素可能相互作用,共同影响POP体验的非加性和非线性。受到这一挑战的激励,并受到深度神经网络(DNN)精确近似复杂函数的灵活性的启发,我们通过将传统的DNN方法与两个额外组件相适应,推导出基于DNN的两部分模型:一个自举过程和一个滤波算法,以提高传统DNN的稳定性,我们将这种方法称为sDNN。为了提高sDNN的可解释性和透明度,我们进一步推导了一个特征重要性测试程序,以识别与两个数据处理的结果测量相关的重要特征,将该方法称为fsDNN。研究表明,fsDNN不仅为复杂关联下的每个特征提供了统计推理过程,而且利用识别出的特征可以进一步提高sDNN的预测性能。提出的基于sdn和fsdn的两部分模型应用于POP研究的实际数据分析,在应用中,它们明显优于现有的参数和半参数两部分模型。此外,我们进行了广泛的数值研究,并与其他机器学习方法进行了比较,以证明无论数据复杂性如何,sDNN和fsDNN始终优于现有的两部分模型和常用的机器学习方法。已经开发了实现所提出方法的R包,可在补充材料(Zou et al, 2025)中获得,也存放在GitHub (https://github.com/BZou-lab/fsDNN)上。
{"title":"A DEEP NEURAL NETWORK TWO-PART MODEL AND FEATURE IMPORTANCE TEST FOR SEMI-CONTINUOUS DATA.","authors":"Baiming Zou, Xinlei Mi, Shiyu Wan, Di Wu, James G Xenakis, Jianhua Hu, Fei Zou","doi":"10.1214/25-aoas2013","DOIUrl":"10.1214/25-aoas2013","url":null,"abstract":"<p><p>Semi-continuous data frequently arise in clinical practice. For example, while many surgical patients still suffer from varying degrees of acute postoperative pain (POP) sometime after surgery (i.e., POP score > 0), others experience none (i.e., POP score = 0), indicating the existence of two distinct data processes at play. Existing parametric or semi-parametric two-part modeling methods for this type of semi-continuous data can fail to appropriately model the two underlying data processes as such methods rely heavily on (generalized) linear additive assumptions. However, many factors may interact to jointly influence the experience of POP non-additively and non-linearly. Motivated by this challenge and inspired by the flexibility of deep neural networks (DNN) to accurately approximate complex functions universally, we derive a DNN-based two-part model by adapting the conventional DNN methods with two additional components: a bootstrapping procedure along with a filtering algorithm to boost the stability of the conventional DNN, an approach we denote as sDNN. To improve the interpretability and transparency of sDNN, we further derive a feature importance testing procedure to identify important features associated with the outcome measurements of the two data processes, denoting this approach fsDNN. We show that fsDNN not only offers a statistical inference procedure for each feature under complex association but also that using the identified features can further improve the predictive performance of sDNN. The proposed sDNN- and fsDNN-based two-part models are applied to the analysis of real data from a POP study, in which application they clearly demonstrate advantages over the existing parametric and semi-parametric two-part models. Further, we conduct extensive numerical studies and draw comparisons with other machine learning methods to demonstrate that sDNN and fsDNN consistently outperform the existing two-part models and frequently used machine learning methods regardless of the data complexity. An R package implementing the proposed methods has been developed and is available in the Supplementary Material (Zou et al, 2025) and is also deposited on GitHub (https://github.com/BZou-lab/fsDNN).</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 2","pages":"1314-1331"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12263096/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144644080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAYESIAN DATA AUGMENTATION FOR RECURRENT EVENTS UNDER INTERMITTENT ASSESSMENT IN OVERLAPPING INTERVALS WITH APPLICATIONS TO EMR DATA. 重复事件在重叠区间间歇评估下的贝叶斯数据增强与emr数据的应用。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-06-01 Epub Date: 2025-05-28 DOI: 10.1214/24-aoas2007
Xin Liu, Patrick M Schnell

Electronic medical records (EMR) data contain rich information that can facilitate health-related studies but is collected primarily for purposes other than research. For recurrent events, EMR data often do not record event times or counts but only contain intermittently assessed and censored observations (i.e. upper and/or lower bounds for counts in a time interval) at uncontrolled times. This can result in non-contiguous or overlapping assessment intervals with censored event counts. Existing methods for analyzing intermittently assessed recurrent events assume disjoint assessment intervals with known counts (interval count data) due to a focus on prospective studies with controlled assessment times. We propose a Bayesian data augmentation method to analyze the complicated assessments in EMR data for recurrent events. Within a Gibbs sampler, event times are imputed by generating sets of event times from non-homogeneous Poisson processes and rejecting proposed sets that are incompatible with constraints imposed by assessment data. Based on the independent increments property of Poisson processes, we implement three techniques to speed up this rejection sampling imputation method for large EMR datasets: independent sampling by partitioning, truncated generation, and sequential sampling. In a simulation study we show our method accurately estimates parameters of log-linear Poisson process intensities. Although the proposed method can be applied generally to EMR data of recurrent events, our study is specifically motivated by identifying risk factors for falls due to cancer treatment and its supportive medications. We used the proposed method to analyze an EMR dataset comprising 5501 patients treated for breast cancer. Our analysis provides evidence supporting associations between certain risk factors (including classes of medications) and risk of falls.

电子医疗记录(EMR)数据包含丰富的信息,可以促进与健康有关的研究,但主要用于研究以外的目的。对于复发性事件,EMR数据通常不记录事件时间或计数,而只包含在不受控制的时间内间歇性评估和审查的观察结果(即时间间隔内计数的上限和/或下限)。这可能导致不连续或重叠的评估间隔与审查的事件计数。现有的分析间歇性评估的复发事件的方法,由于侧重于评估时间可控的前瞻性研究,假设具有已知计数(间隔计数数据)的不相交评估间隔。我们提出了一种贝叶斯数据增强方法来分析EMR数据中对复发事件的复杂评估。在吉布斯采样器中,通过从非齐次泊松过程中生成事件时间集并拒绝与评估数据施加的约束不兼容的建议集来估算事件时间。基于泊松过程的独立增量特性,我们实现了三种技术来加速这种大型EMR数据集的拒绝采样插入方法:分区独立采样、截断生成和顺序采样。仿真研究表明,该方法能准确地估计对数线性泊松过程强度的参数。虽然所提出的方法可以普遍应用于复发事件的EMR数据,但我们的研究是为了确定癌症治疗及其支持药物导致跌倒的危险因素。我们使用提出的方法分析了包含5501名乳腺癌治疗患者的EMR数据集。我们的分析提供了支持某些风险因素(包括药物类别)与跌倒风险之间关联的证据。
{"title":"BAYESIAN DATA AUGMENTATION FOR RECURRENT EVENTS UNDER INTERMITTENT ASSESSMENT IN OVERLAPPING INTERVALS WITH APPLICATIONS TO EMR DATA.","authors":"Xin Liu, Patrick M Schnell","doi":"10.1214/24-aoas2007","DOIUrl":"10.1214/24-aoas2007","url":null,"abstract":"<p><p>Electronic medical records (EMR) data contain rich information that can facilitate health-related studies but is collected primarily for purposes other than research. For recurrent events, EMR data often do not record event times or counts but only contain intermittently assessed and censored observations (i.e. upper and/or lower bounds for counts in a time interval) at uncontrolled times. This can result in non-contiguous or overlapping assessment intervals with censored event counts. Existing methods for analyzing intermittently assessed recurrent events assume disjoint assessment intervals with known counts (interval count data) due to a focus on prospective studies with controlled assessment times. We propose a Bayesian data augmentation method to analyze the complicated assessments in EMR data for recurrent events. Within a Gibbs sampler, event times are imputed by generating sets of event times from non-homogeneous Poisson processes and rejecting proposed sets that are incompatible with constraints imposed by assessment data. Based on the independent increments property of Poisson processes, we implement three techniques to speed up this rejection sampling imputation method for large EMR datasets: independent sampling by partitioning, truncated generation, and sequential sampling. In a simulation study we show our method accurately estimates parameters of log-linear Poisson process intensities. Although the proposed method can be applied generally to EMR data of recurrent events, our study is specifically motivated by identifying risk factors for falls due to cancer treatment and its supportive medications. We used the proposed method to analyze an EMR dataset comprising 5501 patients treated for breast cancer. Our analysis provides evidence supporting associations between certain risk factors (including classes of medications) and risk of falls.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 2","pages":"1332-1361"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12393837/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annals of Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1