首页 > 最新文献

Biometrics最新文献

英文 中文
Clarifying the role of the Mantel-Haenszel risk difference estimator in randomized clinical trials. 阐明Mantel-Haenszel风险差异估计器在随机临床试验中的作用。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf142
Xiaoyu Qiu, Yuhan Qian, Jaehwan Yi, Jinqiu Wang, Yu Du, Yanyao Yi, Ting Ye

The Mantel-Haenszel (MH) risk difference estimator, commonly used in randomized clinical trials for binary outcomes, calculates a weighted average of stratum-specific risk difference estimators. Traditionally, this method requires the stringent assumption that risk differences are homogeneous across strata, also known as the common (constant) risk difference assumption. In our paper, we relax this assumption and adopt a modern perspective, viewing the MH risk difference estimator as an approach for covariate adjustment in randomized clinical trials, distinguishing its use from that in meta-analysis and observational studies. We demonstrate that, under reasonable restrictions on risk difference variability, the MH risk difference estimator consistently estimates the average treatment effect within a standard super-population framework, which is often the primary interest in randomized clinical trials, in addition to estimating a weighted average of stratum-specific risk differences. We rigorously study its properties under the large-stratum and sparse-stratum asymptotic regimes, as well as under mixed-regime settings. Furthermore, for either estimand, we propose a unified robust variance estimator that improves over the popular variance estimators by Greenland and Robins and Sato et al. and has provable consistency across these asymptotic regimes, regardless of assuming common risk differences. Extensions of our theoretical results also provide new insights into the MH test, the post-stratification estimator, and settings with multiple treatments. Our findings are thoroughly validated through simulations and a clinical trial example.

Mantel-Haenszel (MH)风险差异估计值通常用于随机临床试验的二元结果,计算层特异性风险差异估计值的加权平均值。传统上,该方法要求严格假设风险差在各层之间是均匀的,也称为公共(恒定)风险差假设。在我们的论文中,我们放宽了这一假设,采用现代的观点,将MH风险差异估计器视为随机临床试验中协变量调整的一种方法,将其与荟萃分析和观察性研究中的使用区别开来。我们证明,在对风险差异可变性的合理限制下,MH风险差异估计器始终在标准的超人群框架内估计平均治疗效果,这通常是随机临床试验的主要兴趣,此外还估计了层特异性风险差异的加权平均值。我们严格地研究了它在大地层和稀疏地层渐近状态下以及混合状态下的性质。此外,对于这两种估计,我们提出了一个统一的稳健方差估计量,它比Greenland、Robins和Sato等人的流行方差估计量有所改进,并且在这些渐近状态下具有可证明的一致性,而不管假设共同风险差异。我们的理论结果的扩展也为MH检验、分层后估计器和多种治疗设置提供了新的见解。我们的研究结果通过模拟和临床试验实例得到了彻底的验证。
{"title":"Clarifying the role of the Mantel-Haenszel risk difference estimator in randomized clinical trials.","authors":"Xiaoyu Qiu, Yuhan Qian, Jaehwan Yi, Jinqiu Wang, Yu Du, Yanyao Yi, Ting Ye","doi":"10.1093/biomtc/ujaf142","DOIUrl":"10.1093/biomtc/ujaf142","url":null,"abstract":"<p><p>The Mantel-Haenszel (MH) risk difference estimator, commonly used in randomized clinical trials for binary outcomes, calculates a weighted average of stratum-specific risk difference estimators. Traditionally, this method requires the stringent assumption that risk differences are homogeneous across strata, also known as the common (constant) risk difference assumption. In our paper, we relax this assumption and adopt a modern perspective, viewing the MH risk difference estimator as an approach for covariate adjustment in randomized clinical trials, distinguishing its use from that in meta-analysis and observational studies. We demonstrate that, under reasonable restrictions on risk difference variability, the MH risk difference estimator consistently estimates the average treatment effect within a standard super-population framework, which is often the primary interest in randomized clinical trials, in addition to estimating a weighted average of stratum-specific risk differences. We rigorously study its properties under the large-stratum and sparse-stratum asymptotic regimes, as well as under mixed-regime settings. Furthermore, for either estimand, we propose a unified robust variance estimator that improves over the popular variance estimators by Greenland and Robins and Sato et al. and has provable consistency across these asymptotic regimes, regardless of assuming common risk differences. Extensions of our theoretical results also provide new insights into the MH test, the post-stratification estimator, and settings with multiple treatments. Our findings are thoroughly validated through simulations and a clinical trial example.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12576803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145420927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distal causal excursion effects: modeling long-term effects of time-varying treatments in micro-randomized trials. 远端因果偏移效应:微随机试验中时变治疗的长期效应建模。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf134
Tianchen Qian

Micro-randomized trials (MRTs) play a crucial role in optimizing digital interventions. In an MRT, each participant is sequentially randomized among treatment options hundreds of times. While the interventions tested in MRTs target short-term behavioral responses (proximal outcomes), their ultimate goal is to drive long-term behavior change (distal outcomes). However, existing causal inference methods, such as the causal excursion effect, are limited to proximal outcomes, making it challenging to quantify the long-term impact of interventions. To address this gap, we introduce the distal causal excursion effect (DCEE), a novel estimand that quantifies the long-term effect of time-varying treatments. The DCEE contrasts distal outcomes under two excursion policies while marginalizing over most treatment assignments, enabling a parsimonious and interpretable causal model even with a large number of decision points. We propose two estimators for the DCEE-one with cross-fitting and one without-both robust to misspecification of the outcome model. We establish their asymptotic properties and validate their performance through simulations. We apply our method to the HeartSteps MRT to assess the impact of activity prompts on long-term habit formation. Our findings suggest that prompts delivered earlier in the study have a stronger long-term effect than those delivered later, underscoring the importance of intervention timing in behavior change. This work provides the critically needed toolkit for scientists working on digital interventions to assess long-term causal effects using MRT data.

微随机试验(MRTs)在优化数字干预方面发挥着至关重要的作用。在MRT中,每个参与者按顺序随机选择治疗方案数百次。虽然在mrt中测试的干预措施针对的是短期行为反应(近端结果),但它们的最终目标是推动长期行为改变(远端结果)。然而,现有的因果推理方法,如因果偏移效应,仅限于近端结果,这使得量化干预措施的长期影响具有挑战性。为了解决这一差距,我们引入了远端因果偏移效应(DCEE),这是一种量化时变治疗长期效果的新估计。DCEE对比了两种偏移政策下的远端结果,同时边缘化了大多数治疗分配,即使有大量决策点,也能实现简洁且可解释的因果模型。我们为dcee提出了两个估计器-一个具有交叉拟合,一个没有-两者都对结果模型的错误规范具有鲁棒性。我们建立了它们的渐近性质,并通过仿真验证了它们的性能。我们将我们的方法应用于HeartSteps MRT,以评估活动提示对长期习惯形成的影响。我们的研究结果表明,在研究中较早提供的提示比较晚提供的提示具有更强的长期效果,强调了干预时间在行为改变中的重要性。这项工作为从事数字干预的科学家提供了急需的工具包,以利用MRT数据评估长期因果关系。
{"title":"Distal causal excursion effects: modeling long-term effects of time-varying treatments in micro-randomized trials.","authors":"Tianchen Qian","doi":"10.1093/biomtc/ujaf134","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf134","url":null,"abstract":"<p><p>Micro-randomized trials (MRTs) play a crucial role in optimizing digital interventions. In an MRT, each participant is sequentially randomized among treatment options hundreds of times. While the interventions tested in MRTs target short-term behavioral responses (proximal outcomes), their ultimate goal is to drive long-term behavior change (distal outcomes). However, existing causal inference methods, such as the causal excursion effect, are limited to proximal outcomes, making it challenging to quantify the long-term impact of interventions. To address this gap, we introduce the distal causal excursion effect (DCEE), a novel estimand that quantifies the long-term effect of time-varying treatments. The DCEE contrasts distal outcomes under two excursion policies while marginalizing over most treatment assignments, enabling a parsimonious and interpretable causal model even with a large number of decision points. We propose two estimators for the DCEE-one with cross-fitting and one without-both robust to misspecification of the outcome model. We establish their asymptotic properties and validate their performance through simulations. We apply our method to the HeartSteps MRT to assess the impact of activity prompts on long-term habit formation. Our findings suggest that prompts delivered earlier in the study have a stronger long-term effect than those delivered later, underscoring the importance of intervention timing in behavior change. This work provides the critically needed toolkit for scientists working on digital interventions to assess long-term causal effects using MRT data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145298424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Entrywise splitting cross-validation in generalized factor models: from sample splitting to entrywise splitting. 广义因子模型的入项分裂交叉验证:从样本分裂到入项分裂。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf153
Zhijing Wang

The generalized factor models have been widely employed for dimension reduction across various types of multivariate data, including binary choices, counts, and continuous observations. While determining the number of factors in such models has received significant scholarly attention, it remains an open challenge in the field. In this paper, we propose a cross-validation (CV) method based on entrywise splitting (ES), rather than sample splitting, to address this problem. Similar to traditional cross-validation, this approach primarily prevents the underestimation of the number of factors. We then introduce a penalized entrywise splitting cross-validation criterion, which integrates the original CV with information theoretic criteria by adding a penalty term. Its consistency is established under mild conditions in a high-dimensional setting, where both the sample size and the number of features grow to infinity. Furthermore, we extend our methodology to random missing data with different probability scenarios. We evaluate the performance of the proposed method through comprehensive simulations and apply it to a mouse brain single-cell RNA sequencing dataset.

广义因子模型已被广泛用于各种类型的多变量数据的降维,包括二元选择、计数和连续观测。虽然确定这些模型中的因素数量已经受到了重要的学术关注,但它仍然是该领域的一个公开挑战。在本文中,我们提出了一种基于入口分裂(ES)而不是样本分裂的交叉验证(CV)方法来解决这个问题。与传统的交叉验证类似,这种方法主要防止了对因素数量的低估。然后,我们引入了一个受惩罚的入口分裂交叉验证准则,该准则通过添加惩罚项将原始CV与信息论准则相结合。它的一致性是在温和的条件下建立的,在高维环境中,样本大小和特征数量都增长到无穷大。此外,我们将我们的方法扩展到具有不同概率情景的随机丢失数据。我们通过综合模拟评估了所提出方法的性能,并将其应用于小鼠大脑单细胞RNA测序数据集。
{"title":"Entrywise splitting cross-validation in generalized factor models: from sample splitting to entrywise splitting.","authors":"Zhijing Wang","doi":"10.1093/biomtc/ujaf153","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf153","url":null,"abstract":"<p><p>The generalized factor models have been widely employed for dimension reduction across various types of multivariate data, including binary choices, counts, and continuous observations. While determining the number of factors in such models has received significant scholarly attention, it remains an open challenge in the field. In this paper, we propose a cross-validation (CV) method based on entrywise splitting (ES), rather than sample splitting, to address this problem. Similar to traditional cross-validation, this approach primarily prevents the underestimation of the number of factors. We then introduce a penalized entrywise splitting cross-validation criterion, which integrates the original CV with information theoretic criteria by adding a penalty term. Its consistency is established under mild conditions in a high-dimensional setting, where both the sample size and the number of features grow to infinity. Furthermore, we extend our methodology to random missing data with different probability scenarios. We evaluate the performance of the proposed method through comprehensive simulations and apply it to a mouse brain single-cell RNA sequencing dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145628656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible Bayesian quantile regression for counts via generative modeling. 灵活的贝叶斯分位数回归计数通过生成建模。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf152
Yuta Yamauchi, Genya Kobayashi, Shonosuke Sugasawa

Count data frequently arises in biomedical applications, such as the length of hospital stay. However, their discrete nature poses significant challenges for appropriately modeling conditional quantiles, which are crucial for understanding heterogeneous effects and variability in outcomes. To solve the practical difficulty, we propose a novel general Bayesian framework for quantile regression tailored to count data. We seek the regression parameter on the conditional quantile by minimizing the expected loss with respect to the distribution of the conditional quantile of the latent continuous variable associated with the observed count response variable. By modeling the unknown conditional distribution through a Bayesian nonparametric kernel mixture for the joint distribution of the count response and covariates, we obtain the posterior distribution of the regression parameter via a simple optimization. We numerically demonstrate that the proposed method improves bias and estimation accuracy of the existing crude approaches to count quantile regression. Furthermore, we analyze the length of hospital stay for acute myocardial infarction and demonstrate that the proposed method gives more interpretable and flexible results than the existing ones.

计数数据经常出现在生物医学应用中,例如住院时间。然而,它们的离散性对适当地建模条件分位数提出了重大挑战,这对于理解结果的异质性效应和可变性至关重要。为了解决实际困难,我们提出了一种针对计数数据的分位数回归的通用贝叶斯框架。我们通过最小化与观测计数响应变量相关的潜在连续变量的条件分位数分布的期望损失来寻求条件分位数上的回归参数。通过对计数响应和协变量联合分布的贝叶斯非参数核混合模型对未知条件分布进行建模,通过简单的优化得到回归参数的后验分布。数值结果表明,本文提出的方法改善了现有计数分位数回归方法的偏倚和估计精度。此外,我们对急性心肌梗死的住院时间进行了分析,并证明所提出的方法比现有方法具有更强的可解释性和灵活性。
{"title":"Flexible Bayesian quantile regression for counts via generative modeling.","authors":"Yuta Yamauchi, Genya Kobayashi, Shonosuke Sugasawa","doi":"10.1093/biomtc/ujaf152","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf152","url":null,"abstract":"<p><p>Count data frequently arises in biomedical applications, such as the length of hospital stay. However, their discrete nature poses significant challenges for appropriately modeling conditional quantiles, which are crucial for understanding heterogeneous effects and variability in outcomes. To solve the practical difficulty, we propose a novel general Bayesian framework for quantile regression tailored to count data. We seek the regression parameter on the conditional quantile by minimizing the expected loss with respect to the distribution of the conditional quantile of the latent continuous variable associated with the observed count response variable. By modeling the unknown conditional distribution through a Bayesian nonparametric kernel mixture for the joint distribution of the count response and covariates, we obtain the posterior distribution of the regression parameter via a simple optimization. We numerically demonstrate that the proposed method improves bias and estimation accuracy of the existing crude approaches to count quantile regression. Furthermore, we analyze the length of hospital stay for acute myocardial infarction and demonstrate that the proposed method gives more interpretable and flexible results than the existing ones.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145628659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Super learner for survival prediction in case-cohort and generalized case-cohort studies. 在病例队列和广义病例队列研究中用于生存预测的超级学习者。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf155
Haolin Li, Haibo Zhou, David Couper, Jianwen Cai

The case-cohort study design is often used in modern epidemiological studies of rare diseases, as it can achieve similar efficiency as a much larger cohort study with a fraction of the cost. Previous work focused on parameter estimation for case-cohort studies based on a particular statistical model, but few discussed the survival prediction problem under such type of design. In this article, we propose a super learner algorithm for survival prediction in case-cohort studies. We further extend our proposed algorithm to generalized case-cohort studies. The proposed super learner algorithm is shown to have asymptotic model selection consistency as well as uniform consistency. We also demonstrate our algorithm has satisfactory finite sample performances. Simulation studies suggest that the proposed super learners trained by data from case-cohort and generalized case-cohort studies have better prediction accuracy than the ones trained by data from the simple random sampling design with the same sample sizes. Finally, we apply the proposed method to analyze a generalized case-cohort study conducted as part of the Atherosclerosis Risk in Communities Study.

病例队列研究设计经常用于罕见病的现代流行病学研究,因为它可以达到与规模大得多的队列研究相似的效率,而成本只是前者的一小部分。以往的工作主要集中在基于特定统计模型的病例队列研究的参数估计,但很少讨论这种设计下的生存预测问题。在本文中,我们提出了一种超级学习者算法,用于病例队列研究中的生存预测。我们进一步将我们提出的算法扩展到广义病例队列研究。所提出的超级学习算法具有渐近模型选择一致性和均匀一致性。我们还证明了该算法具有令人满意的有限样本性能。仿真研究表明,用病例队列和广义病例队列数据训练的超级学习者比用相同样本量的简单随机抽样设计的数据训练的超级学习者具有更好的预测精度。最后,我们应用所提出的方法来分析作为社区动脉粥样硬化风险研究一部分的一项广义病例队列研究。
{"title":"Super learner for survival prediction in case-cohort and generalized case-cohort studies.","authors":"Haolin Li, Haibo Zhou, David Couper, Jianwen Cai","doi":"10.1093/biomtc/ujaf155","DOIUrl":"10.1093/biomtc/ujaf155","url":null,"abstract":"<p><p>The case-cohort study design is often used in modern epidemiological studies of rare diseases, as it can achieve similar efficiency as a much larger cohort study with a fraction of the cost. Previous work focused on parameter estimation for case-cohort studies based on a particular statistical model, but few discussed the survival prediction problem under such type of design. In this article, we propose a super learner algorithm for survival prediction in case-cohort studies. We further extend our proposed algorithm to generalized case-cohort studies. The proposed super learner algorithm is shown to have asymptotic model selection consistency as well as uniform consistency. We also demonstrate our algorithm has satisfactory finite sample performances. Simulation studies suggest that the proposed super learners trained by data from case-cohort and generalized case-cohort studies have better prediction accuracy than the ones trained by data from the simple random sampling design with the same sample sizes. Finally, we apply the proposed method to analyze a generalized case-cohort study conducted as part of the Atherosclerosis Risk in Communities Study.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665972/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145647246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A semiparametric method for addressing underdiagnosis using electronic health record data. 利用电子病历数据解决诊断不足问题的半参数方法。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf157
Weidong Ma, Jordana B Cohen, Jinbo Chen

Effective treatment of medical conditions begins with an accurate diagnosis. However, many conditions are often underdiagnosed, either being overlooked or diagnosed after significant delays. Electronic health records (EHRs) contain extensive patient health information, offering an opportunity to probabilistically identify underdiagnosed individuals. The rationale is that both diagnosed and underdiagnosed patients may display similar health profiles in EHR data, distinguishing them from condition-free patients. Thus, EHR data can be leveraged to develop models that assess an individual's risk of having a condition. To date, this opportunity has largely remained unexploited, partly due to the lack of suitable statistical methods. The key challenge is the positive-unlabeled EHR data structure, which consists of data for diagnosed ("positive") patients and the remaining ("unlabeled") that include underdiagnosed patients and many condition-free patients. Therefore, data for patients who are unambiguously condition-free, essential for developing risk assessment models, are unavailable. To overcome this challenge, we propose ascertaining condition statuses for a small subset of unlabeled patients. We develop a novel statistical method for building accurate models using this supplemented EHR data to estimate the probability that a patient has the condition of interest. We study the asymptotic properties of our method and assess its finite-sample performance through simulation studies. Finally, we apply our method to develop a preliminary model for identifying potentially underdiagnosed non-alcoholic steatohepatitis patients using data from Penn Medicine EHRs.

医疗条件的有效治疗始于准确的诊断。然而,许多疾病往往没有得到充分诊断,要么被忽视,要么在严重延误后才得到诊断。电子健康记录(EHRs)包含广泛的患者健康信息,提供了一个机会,以概率识别未被诊断的个体。其基本原理是,在电子病历数据中,确诊和未确诊的患者可能显示出相似的健康状况,从而将他们与无疾病患者区分开来。因此,电子病历数据可以用来开发评估个人患病风险的模型。迄今为止,这一机会在很大程度上仍未得到利用,部分原因是缺乏适当的统计方法。关键的挑战是阳性-未标记的EHR数据结构,它由诊断(“阳性”)患者的数据和剩余(“未标记”)的数据组成,其中包括未确诊的患者和许多无病患者。因此,对于开发风险评估模型至关重要的无症状患者的数据是不可用的。为了克服这一挑战,我们建议确定一小部分未标记患者的病情状态。我们开发了一种新的统计方法,利用这种补充的电子病历数据来建立准确的模型,以估计患者具有感兴趣条件的概率。我们研究了该方法的渐近性质,并通过仿真研究评估了其有限样本性能。最后,我们应用我们的方法开发了一个初步模型,用于识别潜在的未被诊断的非酒精性脂肪性肝炎患者,使用的数据来自宾夕法尼亚大学医学电子病历。
{"title":"A semiparametric method for addressing underdiagnosis using electronic health record data.","authors":"Weidong Ma, Jordana B Cohen, Jinbo Chen","doi":"10.1093/biomtc/ujaf157","DOIUrl":"10.1093/biomtc/ujaf157","url":null,"abstract":"<p><p>Effective treatment of medical conditions begins with an accurate diagnosis. However, many conditions are often underdiagnosed, either being overlooked or diagnosed after significant delays. Electronic health records (EHRs) contain extensive patient health information, offering an opportunity to probabilistically identify underdiagnosed individuals. The rationale is that both diagnosed and underdiagnosed patients may display similar health profiles in EHR data, distinguishing them from condition-free patients. Thus, EHR data can be leveraged to develop models that assess an individual's risk of having a condition. To date, this opportunity has largely remained unexploited, partly due to the lack of suitable statistical methods. The key challenge is the positive-unlabeled EHR data structure, which consists of data for diagnosed (\"positive\") patients and the remaining (\"unlabeled\") that include underdiagnosed patients and many condition-free patients. Therefore, data for patients who are unambiguously condition-free, essential for developing risk assessment models, are unavailable. To overcome this challenge, we propose ascertaining condition statuses for a small subset of unlabeled patients. We develop a novel statistical method for building accurate models using this supplemented EHR data to estimate the probability that a patient has the condition of interest. We study the asymptotic properties of our method and assess its finite-sample performance through simulation studies. Finally, we apply our method to develop a preliminary model for identifying potentially underdiagnosed non-alcoholic steatohepatitis patients using data from Penn Medicine EHRs.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145647261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structuring, sequencing, staging, selecting: the 4S method for the longitudinal analysis of multidimensional questionnaires in chronic diseases. 结构化、排序、分期、选择:慢性病多维问卷纵向分析的4S法
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf163
Tiphaine Saulnier, Wassilios G Meissner, Margherita Fabbri, Alexandra Foubert-Samier, Cécile Proust-Lima

In clinical studies, questionnaires are often used to report disease-related manifestations from clinician and/or patient perspectives. Their analysis can help identify relevant manifestations throughout the disease course, enhancing knowledge of disease progression and guiding clinicians in appropriate care provision. However, the analysis of questionnaires in health studies is not straightforward as made of repeated, ordinal, and potentially multidimensional item data. Sum-score summaries may considerably reduce information and hamper interpretation; items' changes over time occur along clinical progression; and as many other longitudinal processes, observations may be truncated by events. This work establishes a comprehensive strategy in four consecutive steps to leverage repeated ordinal data from multidimensional questionnaires. The 4S method successively (1) identifies the questionnaire structure into dimensions satisfying three calibration assumptions (unidimensionality, conditional independence, increasing monotonicity), (2) describes each dimension progression using a joint latent process model which includes a continuous-time item response theory model for the longitudinal subpart, (3) aligns each dimension progression with disease stages through a projection approach, and (4) identifies the most informative items across disease stages using the Fisher information. The method is applied to multiple system atrophy (MSA), a rare neurodegenerative disease, with the analysis of daily activity and motor impairments over disease progression. The 4S method provides an effective and complete analytical strategy for questionnaires repeatedly collected in health studies.

在临床研究中,问卷调查常用于从临床医生和/或患者的角度报告疾病相关的表现。它们的分析可以帮助识别整个疾病过程中的相关表现,增强对疾病进展的了解,并指导临床医生提供适当的护理。然而,对健康研究问卷的分析并不简单,因为它是由重复的、有序的和潜在的多维项目数据组成的。总成绩总结可能会大大减少信息并妨碍解释;项目随时间的变化随临床进展而发生;和许多其他纵向过程一样,观察结果可能会被事件截断。这项工作建立了一个全面的战略,在四个连续的步骤,以利用从多维问卷重复有序数据。4S方法依次(1)将问卷结构识别为满足三个校准假设(单维性、条件独立性、递增单调性)的维度;(2)使用联合潜过程模型描述每个维度的进展,该模型包含纵向子部分的连续时间项目反应理论模型;(3)通过投影法将每个维度进展与疾病阶段对齐。(4)利用Fisher信息确定疾病分期中最有信息的项目。该方法应用于多系统萎缩(MSA),一种罕见的神经退行性疾病,通过分析疾病进展中的日常活动和运动损伤。4S法为健康研究中反复收集的问卷提供了一种有效、完整的分析策略。
{"title":"Structuring, sequencing, staging, selecting: the 4S method for the longitudinal analysis of multidimensional questionnaires in chronic diseases.","authors":"Tiphaine Saulnier, Wassilios G Meissner, Margherita Fabbri, Alexandra Foubert-Samier, Cécile Proust-Lima","doi":"10.1093/biomtc/ujaf163","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf163","url":null,"abstract":"<p><p>In clinical studies, questionnaires are often used to report disease-related manifestations from clinician and/or patient perspectives. Their analysis can help identify relevant manifestations throughout the disease course, enhancing knowledge of disease progression and guiding clinicians in appropriate care provision. However, the analysis of questionnaires in health studies is not straightforward as made of repeated, ordinal, and potentially multidimensional item data. Sum-score summaries may considerably reduce information and hamper interpretation; items' changes over time occur along clinical progression; and as many other longitudinal processes, observations may be truncated by events. This work establishes a comprehensive strategy in four consecutive steps to leverage repeated ordinal data from multidimensional questionnaires. The 4S method successively (1) identifies the questionnaire structure into dimensions satisfying three calibration assumptions (unidimensionality, conditional independence, increasing monotonicity), (2) describes each dimension progression using a joint latent process model which includes a continuous-time item response theory model for the longitudinal subpart, (3) aligns each dimension progression with disease stages through a projection approach, and (4) identifies the most informative items across disease stages using the Fisher information. The method is applied to multiple system atrophy (MSA), a rare neurodegenerative disease, with the analysis of daily activity and motor impairments over disease progression. The 4S method provides an effective and complete analytical strategy for questionnaires repeatedly collected in health studies.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145832991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Covariate-Adjusted Response-Adaptive Randomization for Multi-Arm Clinical Trials Using a Modified Forward Looking Gittins Index Rule. 修正:使用改进的前瞻性gittin指数规则进行多组临床试验的协变量调整反应-自适应随机化。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf139
{"title":"Correction to: Covariate-Adjusted Response-Adaptive Randomization for Multi-Arm Clinical Trials Using a Modified Forward Looking Gittins Index Rule.","authors":"","doi":"10.1093/biomtc/ujaf139","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf139","url":null,"abstract":"","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145372147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large row-constrained supersaturated designs for high-throughput screening. 用于高通量筛选的大行约束过饱和设计。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf160
Byran J Smucker, Stephen E Wright, Isaac Williams, Richard C Page, Andor J Kiss, Surendra Bikram Silwal, Maria Weese, David J Edwards

High-throughput screening, in which large numbers of compounds are traditionally studied one-at-a-time in multiwell plates against specific targets, is widely used across many areas of the biological sciences, including drug discovery. To improve the effectiveness of these screens, we propose a new class of supersaturated designs that guide the construction of pools of compounds in each well. Because the size of the pools is typically limited by the particular application, the new designs accommodate this constraint and are part of a larger procedure that we call Constrained Row Screening or CRowS. We develop an efficient computational procedure to construct the CRowS designs, provide some initial lower bounds on the average squared off-diagonal values of their main-effects information matrix, and study the impact of the constraint on design quality. We also show via simulation that CRowS is statistically superior to the traditional one-compound-one-well approach as well as an existing pooling method, and demonstrate the use of the new methodology on a Verona Integron-encoded Metallo-$beta$-lactamase-2 assay.

传统的高通量筛选方法是在多孔板上针对特定靶点一次对大量化合物进行研究,这种方法被广泛应用于生物科学的许多领域,包括药物发现。为了提高这些筛管的有效性,我们提出了一类新的过饱和设计,可以指导每口井中化合物池的构建。由于池的大小通常受到特定应用程序的限制,因此新的设计适应了这一约束,并且是我们称为约束行筛选(Constrained Row Screening, CRowS)的更大过程的一部分。我们开发了一种高效的计算程序来构建CRowS设计,给出了其主效应信息矩阵的非对角线平均平方值的初始下界,并研究了约束对设计质量的影响。我们还通过模拟表明,CRowS在统计上优于传统的一化合物一井方法以及现有的池化方法,并演示了新方法在维罗纳整合子编码的金属- β -内酰胺酶-2分析中的应用。
{"title":"Large row-constrained supersaturated designs for high-throughput screening.","authors":"Byran J Smucker, Stephen E Wright, Isaac Williams, Richard C Page, Andor J Kiss, Surendra Bikram Silwal, Maria Weese, David J Edwards","doi":"10.1093/biomtc/ujaf160","DOIUrl":"10.1093/biomtc/ujaf160","url":null,"abstract":"<p><p>High-throughput screening, in which large numbers of compounds are traditionally studied one-at-a-time in multiwell plates against specific targets, is widely used across many areas of the biological sciences, including drug discovery. To improve the effectiveness of these screens, we propose a new class of supersaturated designs that guide the construction of pools of compounds in each well. Because the size of the pools is typically limited by the particular application, the new designs accommodate this constraint and are part of a larger procedure that we call Constrained Row Screening or CRowS. We develop an efficient computational procedure to construct the CRowS designs, provide some initial lower bounds on the average squared off-diagonal values of their main-effects information matrix, and study the impact of the constraint on design quality. We also show via simulation that CRowS is statistically superior to the traditional one-compound-one-well approach as well as an existing pooling method, and demonstrate the use of the new methodology on a Verona Integron-encoded Metallo-$beta$-lactamase-2 assay.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145720530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical inference on high-dimensional covariate-dependent Gaussian graphical regressions. 高维协变量相关高斯图形回归的统计推断。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf165
Xuran Meng, Jingfei Zhang, Yi Li

In many genomic studies, gene co-expression graphs are influenced by subject-level covariates like single nucleotide polymorphisms. Traditional Gaussian graphical models ignore these covariates and estimate only population-level networks, potentially masking important heterogeneity. Covariate-dependent Gaussian graphical regressions address this limitation by regressing the precision matrix on covariates, thereby modeling how graph structures vary with high-dimensional subject-specific covariates. To fit the model, we adopt a multi-task learning approach that achieves lower error rates than node-wise regressions. Yet, the important problem of statistical inference in this setting remains largely unexplored. We propose a class of debiased estimators based on multi-task learners, which can be computed quickly and separately. In a key step, we introduce a novel projection technique for estimating the inverse covariance matrix, reducing optimization costs to scale with the sample size n. Our debiased estimators achieve fast convergence and asymptotic normality, enabling valid inference. Simulations demonstrate the utility of the method, and an application to a brain cancer gene-expression dataset reveals meaningful biological relationships.

在许多基因组研究中,基因共表达图受到受试者水平协变量(如单核苷酸多态性)的影响。传统的高斯图形模型忽略了这些协变量,只估计人口水平的网络,潜在地掩盖了重要的异质性。协变量相关的高斯图形回归通过在协变量上回归精度矩阵来解决这一限制,从而建模图结构如何随高维特定主题的协变量而变化。为了拟合模型,我们采用了一种多任务学习方法,它比节点智能回归的错误率更低。然而,在这种情况下,统计推断的重要问题在很大程度上仍未得到探索。我们提出了一种基于多任务学习器的去偏估计器,它可以快速且独立地计算。在关键步骤中,我们引入了一种新的投影技术来估计逆协方差矩阵,减少了随样本量n缩放的优化成本。我们的去偏估计器实现了快速收敛和渐近正态性,从而实现了有效的推理。模拟证明了该方法的实用性,并将其应用于脑癌基因表达数据集,揭示了有意义的生物学关系。
{"title":"Statistical inference on high-dimensional covariate-dependent Gaussian graphical regressions.","authors":"Xuran Meng, Jingfei Zhang, Yi Li","doi":"10.1093/biomtc/ujaf165","DOIUrl":"10.1093/biomtc/ujaf165","url":null,"abstract":"<p><p>In many genomic studies, gene co-expression graphs are influenced by subject-level covariates like single nucleotide polymorphisms. Traditional Gaussian graphical models ignore these covariates and estimate only population-level networks, potentially masking important heterogeneity. Covariate-dependent Gaussian graphical regressions address this limitation by regressing the precision matrix on covariates, thereby modeling how graph structures vary with high-dimensional subject-specific covariates. To fit the model, we adopt a multi-task learning approach that achieves lower error rates than node-wise regressions. Yet, the important problem of statistical inference in this setting remains largely unexplored. We propose a class of debiased estimators based on multi-task learners, which can be computed quickly and separately. In a key step, we introduce a novel projection technique for estimating the inverse covariance matrix, reducing optimization costs to scale with the sample size n. Our debiased estimators achieve fast convergence and asymptotic normality, enabling valid inference. Simulations demonstrate the utility of the method, and an application to a brain cancer gene-expression dataset reveals meaningful biological relationships.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12720500/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145802935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1