首页 > 最新文献

Biometrics最新文献

英文 中文
Federated double machine learning for high-dimensional semiparametric models. 高维半参数模型的联合双机器学习。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf150
Kai Kang, Zhihao Wu, Xinjie Qian, Xinyuan Song, Hongtu Zhu

Federated learning enables the training of a global model while keeping data localized; however, current methods face challenges with high-dimensional semiparametric models that involve complex nuisance parameters. This paper proposes a federated double machine learning framework designed to address high-dimensional nuisance parameters of semiparametric models in multicenter studies. Our approach leverages double machine learning (Chernozhukov et al., 2018a) to estimate center-specific parameters, extends the surrogate efficient score method within a Neyman-orthogonal framework, and applies density ratio tilting to create a federated estimator that combines local individual-level data with summary statistics from other centers. This methodology mitigates regularization bias and overfitting in high-dimensional nuisance parameter estimation. We establish the estimator's limiting distribution under minimal assumptions, validate its performance through extensive simulations, and demonstrate its effectiveness in analyzing multiphase data from the Alzheimer's Disease Neuroimaging Initiative study.

联邦学习可以在保持数据本地化的同时训练全局模型;然而,目前的方法面临着涉及复杂干扰参数的高维半参数模型的挑战。本文提出了一种联邦双机器学习框架,旨在解决多中心研究中半参数模型的高维干扰参数。我们的方法利用双机器学习(Chernozhukov等人,2018a)来估计中心特定的参数,在内曼正交框架内扩展代理有效评分方法,并应用密度比倾斜来创建一个联合估计器,该估计器将本地个人层面的数据与来自其他中心的汇总统计数据相结合。该方法减轻了高维干扰参数估计中的正则化偏差和过拟合。我们在最小假设下建立了估计器的极限分布,通过广泛的模拟验证了它的性能,并证明了它在分析来自阿尔茨海默病神经成像倡议研究的多相数据中的有效性。
{"title":"Federated double machine learning for high-dimensional semiparametric models.","authors":"Kai Kang, Zhihao Wu, Xinjie Qian, Xinyuan Song, Hongtu Zhu","doi":"10.1093/biomtc/ujaf150","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf150","url":null,"abstract":"<p><p>Federated learning enables the training of a global model while keeping data localized; however, current methods face challenges with high-dimensional semiparametric models that involve complex nuisance parameters. This paper proposes a federated double machine learning framework designed to address high-dimensional nuisance parameters of semiparametric models in multicenter studies. Our approach leverages double machine learning (Chernozhukov et al., 2018a) to estimate center-specific parameters, extends the surrogate efficient score method within a Neyman-orthogonal framework, and applies density ratio tilting to create a federated estimator that combines local individual-level data with summary statistics from other centers. This methodology mitigates regularization bias and overfitting in high-dimensional nuisance parameter estimation. We establish the estimator's limiting distribution under minimal assumptions, validate its performance through extensive simulations, and demonstrate its effectiveness in analyzing multiphase data from the Alzheimer's Disease Neuroimaging Initiative study.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145562431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging the gap between design and analysis: randomization inference and sensitivity analysis for matched observational studies with treatment doses. 弥合设计和分析之间的差距:随机化推理和对治疗剂量匹配的观察性研究的敏感性分析。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf156
Jeffrey Zhang, Siyu Heng

Matching is a commonly used causal inference study design in observational studies. Through matching on measured confounders between different treatment groups, valid randomization inferences can be conducted under the no unmeasured confounding assumption, and sensitivity analysis can be further performed to assess robustness of results to potential unmeasured confounding. However, for many common matched designs, there is still a lack of valid downstream randomization inference and sensitivity analysis methods. Specifically, in matched observational studies with treatment doses (eg, continuous or ordinal treatments), with the exception of some special cases such as pair matching, there is no existing randomization inference or sensitivity analysis method for studying analogs of the sample average treatment effect (ie, Neyman-type weak nulls), and no existing valid sensitivity analysis approach for testing the sharp null of no treatment effect for any subject (ie, Fisher's sharp null) when the outcome is nonbinary. To fill these important gaps, we propose new methods for randomization inference and sensitivity analysis that can work for general matched designs with treatment doses, applicable to general types of outcome variables (eg, binary, ordinal, or continuous), and cover both Fisher's sharp null and Neyman-type weak nulls. We illustrate our methods via comprehensive simulation studies and a real data application. All the proposed methods have been incorporated into $tt {R}$ package $tt {doseSens}$.

匹配是观察性研究中常用的因果推理研究设计。通过对不同处理组间测量混杂因素的匹配,可以在无不可测混杂假设下进行有效的随机化推断,并进一步进行敏感性分析,评估结果对潜在不可测混杂因素的稳健性。然而,对于许多常见的匹配设计,仍然缺乏有效的下游随机化推理和灵敏度分析方法。具体而言,在治疗剂量匹配的观察性研究中(如连续或顺序治疗),除配对等特殊情况外,没有现有的随机化推理或灵敏度分析方法来研究样本平均治疗效果的类似物(即neyman型弱零值),也没有现有的有效的灵敏度分析方法来检验任何受试者无治疗效果的锐零值(即:当结果是非二元的时候。为了填补这些重要的空白,我们提出了新的随机化推理和敏感性分析方法,这些方法可以适用于治疗剂量的一般匹配设计,适用于一般类型的结果变量(例如,二进制,有序或连续),并涵盖Fisher尖锐零值和neyman型弱零值。我们通过全面的仿真研究和实际数据应用来说明我们的方法。所有建议的方法都已纳入$tt {R}$ package $tt {doseSens}$。
{"title":"Bridging the gap between design and analysis: randomization inference and sensitivity analysis for matched observational studies with treatment doses.","authors":"Jeffrey Zhang, Siyu Heng","doi":"10.1093/biomtc/ujaf156","DOIUrl":"10.1093/biomtc/ujaf156","url":null,"abstract":"<p><p>Matching is a commonly used causal inference study design in observational studies. Through matching on measured confounders between different treatment groups, valid randomization inferences can be conducted under the no unmeasured confounding assumption, and sensitivity analysis can be further performed to assess robustness of results to potential unmeasured confounding. However, for many common matched designs, there is still a lack of valid downstream randomization inference and sensitivity analysis methods. Specifically, in matched observational studies with treatment doses (eg, continuous or ordinal treatments), with the exception of some special cases such as pair matching, there is no existing randomization inference or sensitivity analysis method for studying analogs of the sample average treatment effect (ie, Neyman-type weak nulls), and no existing valid sensitivity analysis approach for testing the sharp null of no treatment effect for any subject (ie, Fisher's sharp null) when the outcome is nonbinary. To fill these important gaps, we propose new methods for randomization inference and sensitivity analysis that can work for general matched designs with treatment doses, applicable to general types of outcome variables (eg, binary, ordinal, or continuous), and cover both Fisher's sharp null and Neyman-type weak nulls. We illustrate our methods via comprehensive simulation studies and a real data application. All the proposed methods have been incorporated into $tt {R}$ package $tt {doseSens}$.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145647307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating heterogeneous treatment effects for general responses. 估计一般反应的异质性治疗效果。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf162
Zijun Gao, Trevor Hastie

Heterogeneous treatment effect models allow us to compare treatments at subgroup levels and are becoming increasingly popular in applications such as personalized medicine, advertising, and education. Regardless of the type of responses (continuous, binary, count, survival), most causal estimands focus on the differences between the treatment and control conditional means. In this paper, we propose an alternative estimand, DINA-the DIfference in NAtural parameters-to quantify heterogeneous treatment effects motivated by exponential families and the Cox model. Despite the type of responses, DINA is both convenient and more practical for modeling the influence of covariates on the treatment effect. Additionally, we introduce a meta-algorithm for DINA, enabling practitioners to utilize powerful off-the-shelf machine learning tools for the estimation of nuisance functions. This meta-algorithm is also statistically robust to errors in the nuisance function estimation. We demonstrate the efficacy of our method in combination with various machine learning base-learners on both simulated and real datasets.

异质性治疗效果模型使我们能够在亚组水平上比较治疗,并且在个性化医疗、广告和教育等应用中越来越受欢迎。不管反应的类型是什么(连续的、二元的、计数的、存活的),大多数因果估计都集中在治疗和控制条件手段之间的差异上。在本文中,我们提出了一个替代估计,dina -自然参数的差异-来量化由指数族和Cox模型驱动的异质性治疗效果。尽管反应类型不同,但DINA对于协变量对治疗效果的影响建模既方便又实用。此外,我们为DINA引入了一种元算法,使从业者能够利用强大的现成机器学习工具来估计干扰函数。该元算法对干扰函数估计中的误差也具有统计鲁棒性。我们在模拟和真实数据集上展示了我们的方法与各种机器学习基础学习器相结合的有效性。
{"title":"Estimating heterogeneous treatment effects for general responses.","authors":"Zijun Gao, Trevor Hastie","doi":"10.1093/biomtc/ujaf162","DOIUrl":"10.1093/biomtc/ujaf162","url":null,"abstract":"<p><p>Heterogeneous treatment effect models allow us to compare treatments at subgroup levels and are becoming increasingly popular in applications such as personalized medicine, advertising, and education. Regardless of the type of responses (continuous, binary, count, survival), most causal estimands focus on the differences between the treatment and control conditional means. In this paper, we propose an alternative estimand, DINA-the DIfference in NAtural parameters-to quantify heterogeneous treatment effects motivated by exponential families and the Cox model. Despite the type of responses, DINA is both convenient and more practical for modeling the influence of covariates on the treatment effect. Additionally, we introduce a meta-algorithm for DINA, enabling practitioners to utilize powerful off-the-shelf machine learning tools for the estimation of nuisance functions. This meta-algorithm is also statistically robust to errors in the nuisance function estimation. We demonstrate the efficacy of our method in combination with various machine learning base-learners on both simulated and real datasets.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12728347/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structuring, sequencing, staging, selecting: the 4S method for the longitudinal analysis of multidimensional questionnaires in chronic diseases. 结构化、排序、分期、选择:慢性病多维问卷纵向分析的4S法
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf163
Tiphaine Saulnier, Wassilios G Meissner, Margherita Fabbri, Alexandra Foubert-Samier, Cécile Proust-Lima

In clinical studies, questionnaires are often used to report disease-related manifestations from clinician and/or patient perspectives. Their analysis can help identify relevant manifestations throughout the disease course, enhancing knowledge of disease progression and guiding clinicians in appropriate care provision. However, the analysis of questionnaires in health studies is not straightforward as made of repeated, ordinal, and potentially multidimensional item data. Sum-score summaries may considerably reduce information and hamper interpretation; items' changes over time occur along clinical progression; and as many other longitudinal processes, observations may be truncated by events. This work establishes a comprehensive strategy in four consecutive steps to leverage repeated ordinal data from multidimensional questionnaires. The 4S method successively (1) identifies the questionnaire structure into dimensions satisfying three calibration assumptions (unidimensionality, conditional independence, increasing monotonicity), (2) describes each dimension progression using a joint latent process model which includes a continuous-time item response theory model for the longitudinal subpart, (3) aligns each dimension progression with disease stages through a projection approach, and (4) identifies the most informative items across disease stages using the Fisher information. The method is applied to multiple system atrophy (MSA), a rare neurodegenerative disease, with the analysis of daily activity and motor impairments over disease progression. The 4S method provides an effective and complete analytical strategy for questionnaires repeatedly collected in health studies.

在临床研究中,问卷调查常用于从临床医生和/或患者的角度报告疾病相关的表现。它们的分析可以帮助识别整个疾病过程中的相关表现,增强对疾病进展的了解,并指导临床医生提供适当的护理。然而,对健康研究问卷的分析并不简单,因为它是由重复的、有序的和潜在的多维项目数据组成的。总成绩总结可能会大大减少信息并妨碍解释;项目随时间的变化随临床进展而发生;和许多其他纵向过程一样,观察结果可能会被事件截断。这项工作建立了一个全面的战略,在四个连续的步骤,以利用从多维问卷重复有序数据。4S方法依次(1)将问卷结构识别为满足三个校准假设(单维性、条件独立性、递增单调性)的维度;(2)使用联合潜过程模型描述每个维度的进展,该模型包含纵向子部分的连续时间项目反应理论模型;(3)通过投影法将每个维度进展与疾病阶段对齐。(4)利用Fisher信息确定疾病分期中最有信息的项目。该方法应用于多系统萎缩(MSA),一种罕见的神经退行性疾病,通过分析疾病进展中的日常活动和运动损伤。4S法为健康研究中反复收集的问卷提供了一种有效、完整的分析策略。
{"title":"Structuring, sequencing, staging, selecting: the 4S method for the longitudinal analysis of multidimensional questionnaires in chronic diseases.","authors":"Tiphaine Saulnier, Wassilios G Meissner, Margherita Fabbri, Alexandra Foubert-Samier, Cécile Proust-Lima","doi":"10.1093/biomtc/ujaf163","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf163","url":null,"abstract":"<p><p>In clinical studies, questionnaires are often used to report disease-related manifestations from clinician and/or patient perspectives. Their analysis can help identify relevant manifestations throughout the disease course, enhancing knowledge of disease progression and guiding clinicians in appropriate care provision. However, the analysis of questionnaires in health studies is not straightforward as made of repeated, ordinal, and potentially multidimensional item data. Sum-score summaries may considerably reduce information and hamper interpretation; items' changes over time occur along clinical progression; and as many other longitudinal processes, observations may be truncated by events. This work establishes a comprehensive strategy in four consecutive steps to leverage repeated ordinal data from multidimensional questionnaires. The 4S method successively (1) identifies the questionnaire structure into dimensions satisfying three calibration assumptions (unidimensionality, conditional independence, increasing monotonicity), (2) describes each dimension progression using a joint latent process model which includes a continuous-time item response theory model for the longitudinal subpart, (3) aligns each dimension progression with disease stages through a projection approach, and (4) identifies the most informative items across disease stages using the Fisher information. The method is applied to multiple system atrophy (MSA), a rare neurodegenerative disease, with the analysis of daily activity and motor impairments over disease progression. The 4S method provides an effective and complete analytical strategy for questionnaires repeatedly collected in health studies.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145832991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Super learner for survival prediction in case-cohort and generalized case-cohort studies. 在病例队列和广义病例队列研究中用于生存预测的超级学习者。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf155
Haolin Li, Haibo Zhou, David Couper, Jianwen Cai

The case-cohort study design is often used in modern epidemiological studies of rare diseases, as it can achieve similar efficiency as a much larger cohort study with a fraction of the cost. Previous work focused on parameter estimation for case-cohort studies based on a particular statistical model, but few discussed the survival prediction problem under such type of design. In this article, we propose a super learner algorithm for survival prediction in case-cohort studies. We further extend our proposed algorithm to generalized case-cohort studies. The proposed super learner algorithm is shown to have asymptotic model selection consistency as well as uniform consistency. We also demonstrate our algorithm has satisfactory finite sample performances. Simulation studies suggest that the proposed super learners trained by data from case-cohort and generalized case-cohort studies have better prediction accuracy than the ones trained by data from the simple random sampling design with the same sample sizes. Finally, we apply the proposed method to analyze a generalized case-cohort study conducted as part of the Atherosclerosis Risk in Communities Study.

病例队列研究设计经常用于罕见病的现代流行病学研究,因为它可以达到与规模大得多的队列研究相似的效率,而成本只是前者的一小部分。以往的工作主要集中在基于特定统计模型的病例队列研究的参数估计,但很少讨论这种设计下的生存预测问题。在本文中,我们提出了一种超级学习者算法,用于病例队列研究中的生存预测。我们进一步将我们提出的算法扩展到广义病例队列研究。所提出的超级学习算法具有渐近模型选择一致性和均匀一致性。我们还证明了该算法具有令人满意的有限样本性能。仿真研究表明,用病例队列和广义病例队列数据训练的超级学习者比用相同样本量的简单随机抽样设计的数据训练的超级学习者具有更好的预测精度。最后,我们应用所提出的方法来分析作为社区动脉粥样硬化风险研究一部分的一项广义病例队列研究。
{"title":"Super learner for survival prediction in case-cohort and generalized case-cohort studies.","authors":"Haolin Li, Haibo Zhou, David Couper, Jianwen Cai","doi":"10.1093/biomtc/ujaf155","DOIUrl":"10.1093/biomtc/ujaf155","url":null,"abstract":"<p><p>The case-cohort study design is often used in modern epidemiological studies of rare diseases, as it can achieve similar efficiency as a much larger cohort study with a fraction of the cost. Previous work focused on parameter estimation for case-cohort studies based on a particular statistical model, but few discussed the survival prediction problem under such type of design. In this article, we propose a super learner algorithm for survival prediction in case-cohort studies. We further extend our proposed algorithm to generalized case-cohort studies. The proposed super learner algorithm is shown to have asymptotic model selection consistency as well as uniform consistency. We also demonstrate our algorithm has satisfactory finite sample performances. Simulation studies suggest that the proposed super learners trained by data from case-cohort and generalized case-cohort studies have better prediction accuracy than the ones trained by data from the simple random sampling design with the same sample sizes. Finally, we apply the proposed method to analyze a generalized case-cohort study conducted as part of the Atherosclerosis Risk in Communities Study.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665972/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145647246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A semiparametric method for addressing underdiagnosis using electronic health record data. 利用电子病历数据解决诊断不足问题的半参数方法。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf157
Weidong Ma, Jordana B Cohen, Jinbo Chen

Effective treatment of medical conditions begins with an accurate diagnosis. However, many conditions are often underdiagnosed, either being overlooked or diagnosed after significant delays. Electronic health records (EHRs) contain extensive patient health information, offering an opportunity to probabilistically identify underdiagnosed individuals. The rationale is that both diagnosed and underdiagnosed patients may display similar health profiles in EHR data, distinguishing them from condition-free patients. Thus, EHR data can be leveraged to develop models that assess an individual's risk of having a condition. To date, this opportunity has largely remained unexploited, partly due to the lack of suitable statistical methods. The key challenge is the positive-unlabeled EHR data structure, which consists of data for diagnosed ("positive") patients and the remaining ("unlabeled") that include underdiagnosed patients and many condition-free patients. Therefore, data for patients who are unambiguously condition-free, essential for developing risk assessment models, are unavailable. To overcome this challenge, we propose ascertaining condition statuses for a small subset of unlabeled patients. We develop a novel statistical method for building accurate models using this supplemented EHR data to estimate the probability that a patient has the condition of interest. We study the asymptotic properties of our method and assess its finite-sample performance through simulation studies. Finally, we apply our method to develop a preliminary model for identifying potentially underdiagnosed non-alcoholic steatohepatitis patients using data from Penn Medicine EHRs.

医疗条件的有效治疗始于准确的诊断。然而,许多疾病往往没有得到充分诊断,要么被忽视,要么在严重延误后才得到诊断。电子健康记录(EHRs)包含广泛的患者健康信息,提供了一个机会,以概率识别未被诊断的个体。其基本原理是,在电子病历数据中,确诊和未确诊的患者可能显示出相似的健康状况,从而将他们与无疾病患者区分开来。因此,电子病历数据可以用来开发评估个人患病风险的模型。迄今为止,这一机会在很大程度上仍未得到利用,部分原因是缺乏适当的统计方法。关键的挑战是阳性-未标记的EHR数据结构,它由诊断(“阳性”)患者的数据和剩余(“未标记”)的数据组成,其中包括未确诊的患者和许多无病患者。因此,对于开发风险评估模型至关重要的无症状患者的数据是不可用的。为了克服这一挑战,我们建议确定一小部分未标记患者的病情状态。我们开发了一种新的统计方法,利用这种补充的电子病历数据来建立准确的模型,以估计患者具有感兴趣条件的概率。我们研究了该方法的渐近性质,并通过仿真研究评估了其有限样本性能。最后,我们应用我们的方法开发了一个初步模型,用于识别潜在的未被诊断的非酒精性脂肪性肝炎患者,使用的数据来自宾夕法尼亚大学医学电子病历。
{"title":"A semiparametric method for addressing underdiagnosis using electronic health record data.","authors":"Weidong Ma, Jordana B Cohen, Jinbo Chen","doi":"10.1093/biomtc/ujaf157","DOIUrl":"10.1093/biomtc/ujaf157","url":null,"abstract":"<p><p>Effective treatment of medical conditions begins with an accurate diagnosis. However, many conditions are often underdiagnosed, either being overlooked or diagnosed after significant delays. Electronic health records (EHRs) contain extensive patient health information, offering an opportunity to probabilistically identify underdiagnosed individuals. The rationale is that both diagnosed and underdiagnosed patients may display similar health profiles in EHR data, distinguishing them from condition-free patients. Thus, EHR data can be leveraged to develop models that assess an individual's risk of having a condition. To date, this opportunity has largely remained unexploited, partly due to the lack of suitable statistical methods. The key challenge is the positive-unlabeled EHR data structure, which consists of data for diagnosed (\"positive\") patients and the remaining (\"unlabeled\") that include underdiagnosed patients and many condition-free patients. Therefore, data for patients who are unambiguously condition-free, essential for developing risk assessment models, are unavailable. To overcome this challenge, we propose ascertaining condition statuses for a small subset of unlabeled patients. We develop a novel statistical method for building accurate models using this supplemented EHR data to estimate the probability that a patient has the condition of interest. We study the asymptotic properties of our method and assess its finite-sample performance through simulation studies. Finally, we apply our method to develop a preliminary model for identifying potentially underdiagnosed non-alcoholic steatohepatitis patients using data from Penn Medicine EHRs.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145647261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Covariate-Adjusted Response-Adaptive Randomization for Multi-Arm Clinical Trials Using a Modified Forward Looking Gittins Index Rule. 修正:使用改进的前瞻性gittin指数规则进行多组临床试验的协变量调整反应-自适应随机化。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf139
{"title":"Correction to: Covariate-Adjusted Response-Adaptive Randomization for Multi-Arm Clinical Trials Using a Modified Forward Looking Gittins Index Rule.","authors":"","doi":"10.1093/biomtc/ujaf139","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf139","url":null,"abstract":"","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145372147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large row-constrained supersaturated designs for high-throughput screening. 用于高通量筛选的大行约束过饱和设计。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf160
Byran J Smucker, Stephen E Wright, Isaac Williams, Richard C Page, Andor J Kiss, Surendra Bikram Silwal, Maria Weese, David J Edwards

High-throughput screening, in which large numbers of compounds are traditionally studied one-at-a-time in multiwell plates against specific targets, is widely used across many areas of the biological sciences, including drug discovery. To improve the effectiveness of these screens, we propose a new class of supersaturated designs that guide the construction of pools of compounds in each well. Because the size of the pools is typically limited by the particular application, the new designs accommodate this constraint and are part of a larger procedure that we call Constrained Row Screening or CRowS. We develop an efficient computational procedure to construct the CRowS designs, provide some initial lower bounds on the average squared off-diagonal values of their main-effects information matrix, and study the impact of the constraint on design quality. We also show via simulation that CRowS is statistically superior to the traditional one-compound-one-well approach as well as an existing pooling method, and demonstrate the use of the new methodology on a Verona Integron-encoded Metallo-$beta$-lactamase-2 assay.

传统的高通量筛选方法是在多孔板上针对特定靶点一次对大量化合物进行研究,这种方法被广泛应用于生物科学的许多领域,包括药物发现。为了提高这些筛管的有效性,我们提出了一类新的过饱和设计,可以指导每口井中化合物池的构建。由于池的大小通常受到特定应用程序的限制,因此新的设计适应了这一约束,并且是我们称为约束行筛选(Constrained Row Screening, CRowS)的更大过程的一部分。我们开发了一种高效的计算程序来构建CRowS设计,给出了其主效应信息矩阵的非对角线平均平方值的初始下界,并研究了约束对设计质量的影响。我们还通过模拟表明,CRowS在统计上优于传统的一化合物一井方法以及现有的池化方法,并演示了新方法在维罗纳整合子编码的金属- β -内酰胺酶-2分析中的应用。
{"title":"Large row-constrained supersaturated designs for high-throughput screening.","authors":"Byran J Smucker, Stephen E Wright, Isaac Williams, Richard C Page, Andor J Kiss, Surendra Bikram Silwal, Maria Weese, David J Edwards","doi":"10.1093/biomtc/ujaf160","DOIUrl":"10.1093/biomtc/ujaf160","url":null,"abstract":"<p><p>High-throughput screening, in which large numbers of compounds are traditionally studied one-at-a-time in multiwell plates against specific targets, is widely used across many areas of the biological sciences, including drug discovery. To improve the effectiveness of these screens, we propose a new class of supersaturated designs that guide the construction of pools of compounds in each well. Because the size of the pools is typically limited by the particular application, the new designs accommodate this constraint and are part of a larger procedure that we call Constrained Row Screening or CRowS. We develop an efficient computational procedure to construct the CRowS designs, provide some initial lower bounds on the average squared off-diagonal values of their main-effects information matrix, and study the impact of the constraint on design quality. We also show via simulation that CRowS is statistically superior to the traditional one-compound-one-well approach as well as an existing pooling method, and demonstrate the use of the new methodology on a Verona Integron-encoded Metallo-$beta$-lactamase-2 assay.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145720530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical inference on high-dimensional covariate-dependent Gaussian graphical regressions. 高维协变量相关高斯图形回归的统计推断。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf165
Xuran Meng, Jingfei Zhang, Yi Li

In many genomic studies, gene co-expression graphs are influenced by subject-level covariates like single nucleotide polymorphisms. Traditional Gaussian graphical models ignore these covariates and estimate only population-level networks, potentially masking important heterogeneity. Covariate-dependent Gaussian graphical regressions address this limitation by regressing the precision matrix on covariates, thereby modeling how graph structures vary with high-dimensional subject-specific covariates. To fit the model, we adopt a multi-task learning approach that achieves lower error rates than node-wise regressions. Yet, the important problem of statistical inference in this setting remains largely unexplored. We propose a class of debiased estimators based on multi-task learners, which can be computed quickly and separately. In a key step, we introduce a novel projection technique for estimating the inverse covariance matrix, reducing optimization costs to scale with the sample size n. Our debiased estimators achieve fast convergence and asymptotic normality, enabling valid inference. Simulations demonstrate the utility of the method, and an application to a brain cancer gene-expression dataset reveals meaningful biological relationships.

在许多基因组研究中,基因共表达图受到受试者水平协变量(如单核苷酸多态性)的影响。传统的高斯图形模型忽略了这些协变量,只估计人口水平的网络,潜在地掩盖了重要的异质性。协变量相关的高斯图形回归通过在协变量上回归精度矩阵来解决这一限制,从而建模图结构如何随高维特定主题的协变量而变化。为了拟合模型,我们采用了一种多任务学习方法,它比节点智能回归的错误率更低。然而,在这种情况下,统计推断的重要问题在很大程度上仍未得到探索。我们提出了一种基于多任务学习器的去偏估计器,它可以快速且独立地计算。在关键步骤中,我们引入了一种新的投影技术来估计逆协方差矩阵,减少了随样本量n缩放的优化成本。我们的去偏估计器实现了快速收敛和渐近正态性,从而实现了有效的推理。模拟证明了该方法的实用性,并将其应用于脑癌基因表达数据集,揭示了有意义的生物学关系。
{"title":"Statistical inference on high-dimensional covariate-dependent Gaussian graphical regressions.","authors":"Xuran Meng, Jingfei Zhang, Yi Li","doi":"10.1093/biomtc/ujaf165","DOIUrl":"10.1093/biomtc/ujaf165","url":null,"abstract":"<p><p>In many genomic studies, gene co-expression graphs are influenced by subject-level covariates like single nucleotide polymorphisms. Traditional Gaussian graphical models ignore these covariates and estimate only population-level networks, potentially masking important heterogeneity. Covariate-dependent Gaussian graphical regressions address this limitation by regressing the precision matrix on covariates, thereby modeling how graph structures vary with high-dimensional subject-specific covariates. To fit the model, we adopt a multi-task learning approach that achieves lower error rates than node-wise regressions. Yet, the important problem of statistical inference in this setting remains largely unexplored. We propose a class of debiased estimators based on multi-task learners, which can be computed quickly and separately. In a key step, we introduce a novel projection technique for estimating the inverse covariance matrix, reducing optimization costs to scale with the sample size n. Our debiased estimators achieve fast convergence and asymptotic normality, enabling valid inference. Simulations demonstrate the utility of the method, and an application to a brain cancer gene-expression dataset reveals meaningful biological relationships.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12720500/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145802935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rejoinder to Letter to the Editors "Comments on 'Statistical inference on change points in generalized semiparametric segmented models' by Yang et al. (2025)" by Vito M.R. Muggeo. Vito M.R. Muggeo的《致编辑的信》“对Yang等人(2025)的‘广义半参数分段模型中变化点的统计推断’的评论”的回复。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf148
Guangyu Yang, Min Zhang
{"title":"Rejoinder to Letter to the Editors \"Comments on 'Statistical inference on change points in generalized semiparametric segmented models' by Yang et al. (2025)\" by Vito M.R. Muggeo.","authors":"Guangyu Yang, Min Zhang","doi":"10.1093/biomtc/ujaf148","DOIUrl":"10.1093/biomtc/ujaf148","url":null,"abstract":"","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":" ","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145602083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1