首页 > 最新文献

Biometrics最新文献

英文 中文
Multiply robust estimation of marginal structural models in observational studies subject to covariate-driven observations. 在观测研究中,对受协变因素驱动的观测结果进行边际结构模型的多重稳健估计。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae065
Janie Coulombe, Shu Yang

Electronic health records and other sources of observational data are increasingly used for drawing causal inferences. The estimation of a causal effect using these data not meant for research purposes is subject to confounding and irregularly-spaced covariate-driven observation times affecting the inference. A doubly-weighted estimator accounting for these features has previously been proposed that relies on the correct specification of two nuisance models used for the weights. In this work, we propose a novel consistent multiply robust estimator and demonstrate analytically and in comprehensive simulation studies that it is more flexible and more efficient than the only alternative estimator proposed for the same setting. It is further applied to data from the Add Health study in the United States to estimate the causal effect of therapy counseling on alcohol consumption in American adolescents.

电子健康记录和其他来源的观察数据越来越多地被用于因果推断。使用这些非研究目的的数据来估计因果效应会受到混杂因素和不规则间隔的协变量驱动的观察时间的影响。以前曾提出过一种考虑到这些特征的双重加权估计器,它依赖于对用于加权的两个滋扰模型的正确规范。在这项工作中,我们提出了一种新颖的一致乘稳健估计器,并通过分析和综合模拟研究证明,与针对相同环境提出的唯一替代估计器相比,该估计器更灵活、更高效。我们将其进一步应用于美国 Add Health 研究数据,以估计治疗咨询对美国青少年酒精消费的因果效应。
{"title":"Multiply robust estimation of marginal structural models in observational studies subject to covariate-driven observations.","authors":"Janie Coulombe, Shu Yang","doi":"10.1093/biomtc/ujae065","DOIUrl":"10.1093/biomtc/ujae065","url":null,"abstract":"<p><p>Electronic health records and other sources of observational data are increasingly used for drawing causal inferences. The estimation of a causal effect using these data not meant for research purposes is subject to confounding and irregularly-spaced covariate-driven observation times affecting the inference. A doubly-weighted estimator accounting for these features has previously been proposed that relies on the correct specification of two nuisance models used for the weights. In this work, we propose a novel consistent multiply robust estimator and demonstrate analytically and in comprehensive simulation studies that it is more flexible and more efficient than the only alternative estimator proposed for the same setting. It is further applied to data from the Add Health study in the United States to estimate the causal effect of therapy counseling on alcohol consumption in American adolescents.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11250490/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141619221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric second-order estimation for spatiotemporal point patterns. 时空点模式的非参数二阶估计。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae071
Decai Liang, Jialing Liu, Ye Shen, Yongtao Guan

Many existing methodologies for analyzing spatiotemporal point patterns are developed based on the assumption of stationarity in both space and time for the second-order intensity or pair correlation. In practice, however, such an assumption often lacks validity or proves to be unrealistic. In this paper, we propose a novel and flexible nonparametric approach for estimating the second-order characteristics of spatiotemporal point processes, accommodating non-stationary temporal correlations. Our proposed method employs kernel smoothing and effectively accounts for spatial and temporal correlations differently. Under a spatially increasing-domain asymptotic framework, we establish consistency of the proposed estimators, which can be constructed using different first-order intensity estimators to enhance practicality. Simulation results reveal that our method, in comparison with existing approaches, significantly improves statistical efficiency. An application to a COVID-19 dataset further illustrates the flexibility and interpretability of our procedure.

许多现有的时空点模式分析方法都是基于二阶强度或点对相关性在空间和时间上的静止假设而开发的。然而,在实践中,这种假设往往缺乏有效性或被证明是不现实的。在本文中,我们提出了一种新颖而灵活的非参数方法,用于估计时空点过程的二阶特征,并将非平稳的时间相关性考虑在内。我们提出的方法采用核平滑法,有效地考虑了不同的空间和时间相关性。在空间递增域渐近框架下,我们建立了所提估计器的一致性,可以使用不同的一阶强度估计器来构建估计器,以提高实用性。模拟结果表明,与现有方法相比,我们的方法显著提高了统计效率。对 COVID-19 数据集的应用进一步说明了我们方法的灵活性和可解释性。
{"title":"Nonparametric second-order estimation for spatiotemporal point patterns.","authors":"Decai Liang, Jialing Liu, Ye Shen, Yongtao Guan","doi":"10.1093/biomtc/ujae071","DOIUrl":"https://doi.org/10.1093/biomtc/ujae071","url":null,"abstract":"<p><p>Many existing methodologies for analyzing spatiotemporal point patterns are developed based on the assumption of stationarity in both space and time for the second-order intensity or pair correlation. In practice, however, such an assumption often lacks validity or proves to be unrealistic. In this paper, we propose a novel and flexible nonparametric approach for estimating the second-order characteristics of spatiotemporal point processes, accommodating non-stationary temporal correlations. Our proposed method employs kernel smoothing and effectively accounts for spatial and temporal correlations differently. Under a spatially increasing-domain asymptotic framework, we establish consistency of the proposed estimators, which can be constructed using different first-order intensity estimators to enhance practicality. Simulation results reveal that our method, in comparison with existing approaches, significantly improves statistical efficiency. An application to a COVID-19 dataset further illustrates the flexibility and interpretability of our procedure.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint structure learning and causal effect estimation for categorical graphical models. 分类图形模型的联合结构学习和因果效应估计
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae067
Federico Castelletti, Guido Consonni, Marco L Della Vedova

The scope of this paper is a multivariate setting involving categorical variables. Following an external manipulation of one variable, the goal is to evaluate the causal effect on an outcome of interest. A typical scenario involves a system of variables representing lifestyle, physical and mental features, symptoms, and risk factors, with the outcome being the presence or absence of a disease. These variables are interconnected in complex ways, allowing the effect of an intervention to propagate through multiple paths. A distinctive feature of our approach is the estimation of causal effects while accounting for uncertainty in both the dependence structure, which we represent through a directed acyclic graph (DAG), and the DAG-model parameters. Specifically, we propose a Markov chain Monte Carlo algorithm that targets the joint posterior over DAGs and parameters, based on an efficient reversible-jump proposal scheme. We validate our method through extensive simulation studies and demonstrate that it outperforms current state-of-the-art procedures in terms of estimation accuracy. Finally, we apply our methodology to analyze a dataset on depression and anxiety in undergraduate students.

本文的研究范围是涉及分类变量的多变量环境。在对一个变量进行外部操作后,目标是评估其对相关结果的因果影响。一个典型的情景是由代表生活方式、身心特征、症状和风险因素的变量组成的系统,其结果是是否患有某种疾病。这些变量以复杂的方式相互关联,使得干预效果可以通过多种途径传播。我们方法的一个显著特点是在估算因果效应的同时,考虑到依赖结构(我们通过有向无环图(DAG)表示)和 DAG 模型参数的不确定性。具体来说,我们提出了一种马尔可夫链蒙特卡洛算法,该算法基于高效的可逆跳跃建议方案,以 DAG 和参数的联合后验为目标。我们通过大量的模拟研究验证了我们的方法,并证明它在估计精度方面优于目前最先进的程序。最后,我们将我们的方法应用于分析本科生抑郁和焦虑的数据集。
{"title":"Joint structure learning and causal effect estimation for categorical graphical models.","authors":"Federico Castelletti, Guido Consonni, Marco L Della Vedova","doi":"10.1093/biomtc/ujae067","DOIUrl":"https://doi.org/10.1093/biomtc/ujae067","url":null,"abstract":"<p><p>The scope of this paper is a multivariate setting involving categorical variables. Following an external manipulation of one variable, the goal is to evaluate the causal effect on an outcome of interest. A typical scenario involves a system of variables representing lifestyle, physical and mental features, symptoms, and risk factors, with the outcome being the presence or absence of a disease. These variables are interconnected in complex ways, allowing the effect of an intervention to propagate through multiple paths. A distinctive feature of our approach is the estimation of causal effects while accounting for uncertainty in both the dependence structure, which we represent through a directed acyclic graph (DAG), and the DAG-model parameters. Specifically, we propose a Markov chain Monte Carlo algorithm that targets the joint posterior over DAGs and parameters, based on an efficient reversible-jump proposal scheme. We validate our method through extensive simulation studies and demonstrate that it outperforms current state-of-the-art procedures in terms of estimation accuracy. Finally, we apply our methodology to analyze a dataset on depression and anxiety in undergraduate students.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An interpretable Bayesian clustering approach with feature selection for analyzing spatially resolved transcriptomics data. 用于分析空间解析转录组学数据的带特征选择的可解释贝叶斯聚类方法。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae066
Huimin Li, Bencong Zhu, Xi Jiang, Lei Guo, Yang Xie, Lin Xu, Qiwei Li

Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications.

空间分辨转录组学(SRT)技术的最新突破使我们能够在保留空间信息的同时,在点或细胞水平上进行全面的分子特征描述。细胞是组织的基本组成单位,被组织成不同但又相互连接的组成部分。虽然许多非空间和空间聚类方法都被用来根据 SRT 高维分子图谱将整个区域划分为相互排斥的空间域,但大多数方法都需要临时选择可解释性较差的降维技术。为了克服这一难题,我们提出了一种零膨胀负二项混合模型,根据分子轮廓对斑点或细胞进行聚类。为了提高可解释性,我们采用了一种特征选择机制,根据能揭示聚类结果的鉴别基因提供 SRT 分子剖面的低维摘要。我们还通过马尔可夫随机场先验进一步纳入了 SRT 地理空间概况。通过模拟研究和 3 个真实数据应用,我们展示了这种联合建模策略与其他最先进方法相比如何提高聚类准确性。
{"title":"An interpretable Bayesian clustering approach with feature selection for analyzing spatially resolved transcriptomics data.","authors":"Huimin Li, Bencong Zhu, Xi Jiang, Lei Guo, Yang Xie, Lin Xu, Qiwei Li","doi":"10.1093/biomtc/ujae066","DOIUrl":"10.1093/biomtc/ujae066","url":null,"abstract":"<p><p>Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11285114/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The multivariate Bernoulli detector: change point estimation in discrete survival analysis. 多元伯努利检测器:离散生存分析中的变化点估计。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae075
Willem van den Boom, Maria De Iorio, Fang Qian, Alessandra Guglielmi

Time-to-event data are often recorded on a discrete scale with multiple, competing risks as potential causes for the event. In this context, application of continuous survival analysis methods with a single risk suffers from biased estimation. Therefore, we propose the multivariate Bernoulli detector for competing risks with discrete times involving a multivariate change point model on the cause-specific baseline hazards. Through the prior on the number of change points and their location, we impose dependence between change points across risks, as well as allowing for data-driven learning of their number. Then, conditionally on these change points, a multivariate Bernoulli prior is used to infer which risks are involved. Focus of posterior inference is cause-specific hazard rates and dependence across risks. Such dependence is often present due to subject-specific changes across time that affect all risks. Full posterior inference is performed through a tailored local-global Markov chain Monte Carlo (MCMC) algorithm, which exploits a data augmentation trick and MCMC updates from nonconjugate Bayesian nonparametric methods. We illustrate our model in simulations and on ICU data, comparing its performance with existing approaches.

从时间到事件的数据通常是以离散的尺度记录的,事件的潜在起因是多种相互竞争的风险。在这种情况下,应用单一风险的连续生存分析方法会造成估计偏差。因此,我们提出了针对时间离散的竞争风险的多元伯努利检测器,涉及特定原因基线危害的多元变化点模型。通过对变化点数量及其位置的先验分析,我们在不同风险的变化点之间建立了依赖关系,并允许对变化点数量进行数据驱动学习。然后,以这些变化点为条件,使用多元伯努利先验推断出涉及哪些风险。后验推断的重点是特定病因的危险率和不同风险之间的依赖性。这种依赖性通常是由于特定受试者在不同时期的变化影响了所有风险而产生的。完全后验推断是通过定制的局部-全局马尔科夫链蒙特卡罗(MCMC)算法进行的,该算法利用了数据增强技巧和来自非共轭贝叶斯非参数方法的 MCMC 更新。我们在模拟和重症监护室数据中说明了我们的模型,并将其性能与现有方法进行了比较。
{"title":"The multivariate Bernoulli detector: change point estimation in discrete survival analysis.","authors":"Willem van den Boom, Maria De Iorio, Fang Qian, Alessandra Guglielmi","doi":"10.1093/biomtc/ujae075","DOIUrl":"https://doi.org/10.1093/biomtc/ujae075","url":null,"abstract":"<p><p>Time-to-event data are often recorded on a discrete scale with multiple, competing risks as potential causes for the event. In this context, application of continuous survival analysis methods with a single risk suffers from biased estimation. Therefore, we propose the multivariate Bernoulli detector for competing risks with discrete times involving a multivariate change point model on the cause-specific baseline hazards. Through the prior on the number of change points and their location, we impose dependence between change points across risks, as well as allowing for data-driven learning of their number. Then, conditionally on these change points, a multivariate Bernoulli prior is used to infer which risks are involved. Focus of posterior inference is cause-specific hazard rates and dependence across risks. Such dependence is often present due to subject-specific changes across time that affect all risks. Full posterior inference is performed through a tailored local-global Markov chain Monte Carlo (MCMC) algorithm, which exploits a data augmentation trick and MCMC updates from nonconjugate Bayesian nonparametric methods. We illustrate our model in simulations and on ICU data, comparing its performance with existing approaches.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141970575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards automated animal density estimation with acoustic spatial capture-recapture. 利用声学空间捕获-再捕获技术实现动物密度自动估算。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae081
Yuheng Wang, Juan Ye, Xiaohui Li, David L Borchers

Passive acoustic monitoring can be an effective way of monitoring wildlife populations that are acoustically active but difficult to survey visually, but identifying target species calls in recordings is non-trivial. Machine learning (ML) techniques can do detection quickly but may miss calls and produce false positives, i.e., misidentify calls from other sources as being from the target species. While abundance estimation methods can address the former issue effectively, methods to deal with false positives are under-investigated. We propose an acoustic spatial capture-recapture (ASCR) method that deals with false positives by treating species identity as a latent variable. Individual-level outputs from ML techniques are treated as random variables whose distributions depend on the latent identity. This gives rise to a mixture model likelihood that we maximize to estimate call density. We compare our method to existing methods by applying it to an ASCR survey of frogs and simulated acoustic surveys of gibbons based on real gibbon acoustic data. Estimates from our method are closer to ASCR applied to the dataset without false positives than those from a widely used false positive "correction factor" method. Simulations show our method to have bias close to zero and accurate coverage probabilities and to perform substantially better than ASCR without accounting for false positives.

被动声学监测是监测声学活跃但难以目测的野生动物种群的一种有效方法,但在录音中识别目标物种的叫声并非易事。机器学习(ML)技术可以快速完成检测,但可能会漏检和产生假阳性,即把其他来源的叫声误认为是目标物种的叫声。虽然丰度估算方法可以有效解决前一个问题,但处理误报的方法还没有得到充分研究。我们提出了一种声学空间捕获-再捕获(ASCR)方法,通过将物种身份作为一个潜在变量来处理假阳性。来自 ML 技术的个体级输出被视为随机变量,其分布取决于潜在身份。这就产生了一个混合模型似然,我们将其最大化以估计调用密度。通过将我们的方法应用于 ASCR 青蛙调查和基于真实长臂猿声学数据的模拟长臂猿声学调查,我们将其与现有方法进行了比较。与广泛使用的假阳性 "校正因子 "方法相比,我们的方法得出的估计值更接近于应用于数据集的无假阳性 ASCR 方法。模拟结果表明,我们的方法偏差接近于零,覆盖概率准确,在不考虑假阳性的情况下,其性能大大优于 ASCR。
{"title":"Towards automated animal density estimation with acoustic spatial capture-recapture.","authors":"Yuheng Wang, Juan Ye, Xiaohui Li, David L Borchers","doi":"10.1093/biomtc/ujae081","DOIUrl":"https://doi.org/10.1093/biomtc/ujae081","url":null,"abstract":"<p><p>Passive acoustic monitoring can be an effective way of monitoring wildlife populations that are acoustically active but difficult to survey visually, but identifying target species calls in recordings is non-trivial. Machine learning (ML) techniques can do detection quickly but may miss calls and produce false positives, i.e., misidentify calls from other sources as being from the target species. While abundance estimation methods can address the former issue effectively, methods to deal with false positives are under-investigated. We propose an acoustic spatial capture-recapture (ASCR) method that deals with false positives by treating species identity as a latent variable. Individual-level outputs from ML techniques are treated as random variables whose distributions depend on the latent identity. This gives rise to a mixture model likelihood that we maximize to estimate call density. We compare our method to existing methods by applying it to an ASCR survey of frogs and simulated acoustic surveys of gibbons based on real gibbon acoustic data. Estimates from our method are closer to ASCR applied to the dataset without false positives than those from a widely used false positive \"correction factor\" method. Simulations show our method to have bias close to zero and accurate coverage probabilities and to perform substantially better than ASCR without accounting for false positives.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142079070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing for similarity of multivariate mixed outcomes using generalized joint regression models with application to efficacy-toxicity responses. 利用广义联合回归模型测试多变量混合结果的相似性,并将其应用于疗效-毒性反应。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae077
Niklas Hagemann, Giampiero Marra, Frank Bretz, Kathrin Möllenhoff

A common problem in clinical trials is to test whether the effect of an explanatory variable on a response of interest is similar between two groups, for example, patient or treatment groups. In this regard, similarity is defined as equivalence up to a pre-specified threshold that denotes an acceptable deviation between the two groups. This issue is typically tackled by assessing if the explanatory variable's effect on the response is similar. This assessment is based on, for example, confidence intervals of differences or a suitable distance between two parametric regression models. Typically, these approaches build on the assumption of a univariate continuous or binary outcome variable. However, multivariate outcomes, especially beyond the case of bivariate binary responses, remain underexplored. This paper introduces an approach based on a generalized joint regression framework exploiting the Gaussian copula. Compared to existing methods, our approach accommodates various outcome variable scales, such as continuous, binary, categorical, and ordinal, including mixed outcomes in multi-dimensional spaces. We demonstrate the validity of this approach through a simulation study and an efficacy-toxicity case study, hence highlighting its practical relevance.

临床试验中的一个常见问题是测试解释变量对相关反应的影响在两组(如患者组或治疗组)之间是否相似。在这方面,相似性被定义为在预先指定的阈值内的等效性,该阈值表示两组之间可接受的偏差。这一问题通常通过评估解释变量对反应的影响是否相似来解决。例如,这种评估基于差异的置信区间或两个参数回归模型之间的适当距离。通常,这些方法都建立在单变量连续或二元结果变量的假设之上。然而,多变量结果,特别是二元二进制反应以外的情况,仍然没有得到充分的探讨。本文介绍了一种基于广义联合回归框架、利用高斯协方差的方法。与现有的方法相比,我们的方法适用于各种结果变量尺度,如连续、二元、分类和序数,包括多维空间中的混合结果。我们通过一项模拟研究和一项疗效-毒性案例研究证明了这种方法的有效性,从而突出了它的实用性。
{"title":"Testing for similarity of multivariate mixed outcomes using generalized joint regression models with application to efficacy-toxicity responses.","authors":"Niklas Hagemann, Giampiero Marra, Frank Bretz, Kathrin Möllenhoff","doi":"10.1093/biomtc/ujae077","DOIUrl":"https://doi.org/10.1093/biomtc/ujae077","url":null,"abstract":"<p><p>A common problem in clinical trials is to test whether the effect of an explanatory variable on a response of interest is similar between two groups, for example, patient or treatment groups. In this regard, similarity is defined as equivalence up to a pre-specified threshold that denotes an acceptable deviation between the two groups. This issue is typically tackled by assessing if the explanatory variable's effect on the response is similar. This assessment is based on, for example, confidence intervals of differences or a suitable distance between two parametric regression models. Typically, these approaches build on the assumption of a univariate continuous or binary outcome variable. However, multivariate outcomes, especially beyond the case of bivariate binary responses, remain underexplored. This paper introduces an approach based on a generalized joint regression framework exploiting the Gaussian copula. Compared to existing methods, our approach accommodates various outcome variable scales, such as continuous, binary, categorical, and ordinal, including mixed outcomes in multi-dimensional spaces. We demonstrate the validity of this approach through a simulation study and an efficacy-toxicity case study, hence highlighting its practical relevance.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142016283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Summary statistics knockoffs inference with family-wise error rate control. 利用族智误差率控制进行汇总统计山寨推理。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae082
Catherine Xinrui Yu, Jiaqi Gu, Zhaomeng Chen, Zihuai He

Testing multiple hypotheses of conditional independence with provable error rate control is a fundamental problem with various applications. To infer conditional independence with family-wise error rate (FWER) control when only summary statistics of marginal dependence are accessible, we adopt GhostKnockoff to directly generate knockoff copies of summary statistics and propose a new filter to select features conditionally dependent on the response. In addition, we develop a computationally efficient algorithm to greatly reduce the computational cost of knockoff copies generation without sacrificing power and FWER control. Experiments on simulated data and a real dataset of Alzheimer's disease genetics demonstrate the advantage of the proposed method over existing alternatives in both statistical power and computational efficiency.

在可证明的误差率控制下测试条件独立性的多重假设是一个具有多种应用的基本问题。为了在仅能获得边际依赖性汇总统计量的情况下推断条件独立性并控制族内误差率 (FWER),我们采用了 GhostKnockoff 方法来直接生成汇总统计量的山寨副本,并提出了一种新的过滤器来选择条件依赖于响应的特征。此外,我们还开发了一种计算高效的算法,在不牺牲功率和 FWER 控制的前提下,大大降低了生成山寨副本的计算成本。在模拟数据和阿尔茨海默病遗传学真实数据集上进行的实验表明,与现有的替代方法相比,所提出的方法在统计能力和计算效率方面都更具优势。
{"title":"Summary statistics knockoffs inference with family-wise error rate control.","authors":"Catherine Xinrui Yu, Jiaqi Gu, Zhaomeng Chen, Zihuai He","doi":"10.1093/biomtc/ujae082","DOIUrl":"10.1093/biomtc/ujae082","url":null,"abstract":"<p><p>Testing multiple hypotheses of conditional independence with provable error rate control is a fundamental problem with various applications. To infer conditional independence with family-wise error rate (FWER) control when only summary statistics of marginal dependence are accessible, we adopt GhostKnockoff to directly generate knockoff copies of summary statistics and propose a new filter to select features conditionally dependent on the response. In addition, we develop a computationally efficient algorithm to greatly reduce the computational cost of knockoff copies generation without sacrificing power and FWER control. Experiments on simulated data and a real dataset of Alzheimer's disease genetics demonstrate the advantage of the proposed method over existing alternatives in both statistical power and computational efficiency.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11367731/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142104014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semiparametric inference of effective reproduction number dynamics from wastewater pathogen surveillance data. 从废水病原体监测数据中推断有效繁殖数量动态的半参数。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae074
Isaac H Goldstein, Daniel M Parker, Sunny Jiang, Volodymyr M Minin

Concentrations of pathogen genomes measured in wastewater have recently become available as a new data source to use when modeling the spread of infectious diseases. One promising use for this data source is inference of the effective reproduction number, the average number of individuals a newly infected person will infect. We propose a model where new infections arrive according to a time-varying immigration rate which can be interpreted as an average number of secondary infections produced by one infectious individual per unit time. This model allows us to estimate the effective reproduction number from concentrations of pathogen genomes, while avoiding difficulty to verify assumptions about the dynamics of the susceptible population. As a byproduct of our primary goal, we also produce a new model for estimating the effective reproduction number from case data using the same framework. We test this modeling framework in an agent-based simulation study with a realistic data generating mechanism which accounts for the time-varying dynamics of pathogen shedding. Finally, we apply our new model to estimating the effective reproduction number of SARS-CoV-2, the causative agent of COVID-19, in Los Angeles, CA, using pathogen RNA concentrations collected from a large wastewater treatment facility.

最近,废水中测量到的病原体基因组浓度成为一种新的数据源,可用于模拟传染病的传播。这一数据源的一个很有前景的用途是推断有效繁殖数,即一个新感染者将感染的平均个体数。我们提出了一个新感染者根据随时间变化的移民率到达的模型,该移民率可解释为一个感染者在单位时间内产生的二次感染的平均数量。通过这一模型,我们可以从病原体基因组的浓度中估算出有效的繁殖数量,同时避免了验证易感人群动态假设的困难。作为主要目标的副产品,我们还利用相同的框架制作了一个新模型,用于从病例数据中估算有效繁殖数量。我们在一项基于代理的模拟研究中测试了这一建模框架,该研究采用了现实的数据生成机制,考虑了病原体脱落的时变动态。最后,我们利用从大型废水处理设施收集到的病原体 RNA 浓度,将新模型应用于估算 COVID-19 的病原体 SARS-CoV-2 在加利福尼亚州洛杉矶的有效繁殖数量。
{"title":"Semiparametric inference of effective reproduction number dynamics from wastewater pathogen surveillance data.","authors":"Isaac H Goldstein, Daniel M Parker, Sunny Jiang, Volodymyr M Minin","doi":"10.1093/biomtc/ujae074","DOIUrl":"10.1093/biomtc/ujae074","url":null,"abstract":"<p><p>Concentrations of pathogen genomes measured in wastewater have recently become available as a new data source to use when modeling the spread of infectious diseases. One promising use for this data source is inference of the effective reproduction number, the average number of individuals a newly infected person will infect. We propose a model where new infections arrive according to a time-varying immigration rate which can be interpreted as an average number of secondary infections produced by one infectious individual per unit time. This model allows us to estimate the effective reproduction number from concentrations of pathogen genomes, while avoiding difficulty to verify assumptions about the dynamics of the susceptible population. As a byproduct of our primary goal, we also produce a new model for estimating the effective reproduction number from case data using the same framework. We test this modeling framework in an agent-based simulation study with a realistic data generating mechanism which accounts for the time-varying dynamics of pathogen shedding. Finally, we apply our new model to estimating the effective reproduction number of SARS-CoV-2, the causative agent of COVID-19, in Los Angeles, CA, using pathogen RNA concentrations collected from a large wastewater treatment facility.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141896690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PathGPS: discover shared genetic architecture using GWAS summary data. PathGPS:利用 GWAS 摘要数据发现共享遗传结构。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae060
Zijun Gao, Qingyuan Zhao, Trevor Hastie

The increasing availability and scale of biobanks and "omic" datasets bring new horizons for understanding biological mechanisms. PathGPS is an exploratory data analysis tool to discover genetic architectures using Genome Wide Association Studies (GWAS) summary data. PathGPS is based on a linear structural equation model where traits are regulated by both genetic and environmental pathways. PathGPS decouples the genetic and environmental components by contrasting the GWAS associations of "signal" genes with those of "noise" genes. From the estimated genetic component, PathGPS then extracts genetic pathways via principal component and factor analysis, leveraging the low-rank and sparse properties. In addition, we provide a bootstrap aggregating ("bagging") algorithm to improve stability under data perturbation and hyperparameter tuning. When applied to a metabolomics dataset and the UK Biobank, PathGPS confirms several known gene-trait clusters and suggests multiple new hypotheses for future investigations.

生物库和 "omic "数据集的可用性和规模不断扩大,为了解生物机制带来了新的视野。PathGPS 是一种探索性数据分析工具,用于利用全基因组关联研究(GWAS)汇总数据发现遗传结构。PathGPS 基于线性结构方程模型,在该模型中,性状同时受遗传和环境途径的调节。PathGPS 通过对比 "信号 "基因与 "噪音 "基因在 GWAS 中的关联,将遗传和环境因素分离开来。然后,PathGPS 利用低秩和稀疏特性,通过主成分和因子分析,从估计的遗传成分中提取遗传途径。此外,我们还提供了一种自举聚合("bagging")算法,以提高数据扰动和超参数调整下的稳定性。当应用到代谢组学数据集和英国生物库时,PathGPS 证实了几个已知的基因性状群,并为未来的研究提出了多个新的假设。
{"title":"PathGPS: discover shared genetic architecture using GWAS summary data.","authors":"Zijun Gao, Qingyuan Zhao, Trevor Hastie","doi":"10.1093/biomtc/ujae060","DOIUrl":"10.1093/biomtc/ujae060","url":null,"abstract":"<p><p>The increasing availability and scale of biobanks and \"omic\" datasets bring new horizons for understanding biological mechanisms. PathGPS is an exploratory data analysis tool to discover genetic architectures using Genome Wide Association Studies (GWAS) summary data. PathGPS is based on a linear structural equation model where traits are regulated by both genetic and environmental pathways. PathGPS decouples the genetic and environmental components by contrasting the GWAS associations of \"signal\" genes with those of \"noise\" genes. From the estimated genetic component, PathGPS then extracts genetic pathways via principal component and factor analysis, leveraging the low-rank and sparse properties. In addition, we provide a bootstrap aggregating (\"bagging\") algorithm to improve stability under data perturbation and hyperparameter tuning. When applied to a metabolomics dataset and the UK Biobank, PathGPS confirms several known gene-trait clusters and suggests multiple new hypotheses for future investigations.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247175/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141615885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1