首页 > 最新文献

Biometrics最新文献

英文 中文
Towards automated animal density estimation with acoustic spatial capture-recapture. 利用声学空间捕获-再捕获技术实现动物密度自动估算。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae081
Yuheng Wang, Juan Ye, Xiaohui Li, David L Borchers

Passive acoustic monitoring can be an effective way of monitoring wildlife populations that are acoustically active but difficult to survey visually, but identifying target species calls in recordings is non-trivial. Machine learning (ML) techniques can do detection quickly but may miss calls and produce false positives, i.e., misidentify calls from other sources as being from the target species. While abundance estimation methods can address the former issue effectively, methods to deal with false positives are under-investigated. We propose an acoustic spatial capture-recapture (ASCR) method that deals with false positives by treating species identity as a latent variable. Individual-level outputs from ML techniques are treated as random variables whose distributions depend on the latent identity. This gives rise to a mixture model likelihood that we maximize to estimate call density. We compare our method to existing methods by applying it to an ASCR survey of frogs and simulated acoustic surveys of gibbons based on real gibbon acoustic data. Estimates from our method are closer to ASCR applied to the dataset without false positives than those from a widely used false positive "correction factor" method. Simulations show our method to have bias close to zero and accurate coverage probabilities and to perform substantially better than ASCR without accounting for false positives.

被动声学监测是监测声学活跃但难以目测的野生动物种群的一种有效方法,但在录音中识别目标物种的叫声并非易事。机器学习(ML)技术可以快速完成检测,但可能会漏检和产生假阳性,即把其他来源的叫声误认为是目标物种的叫声。虽然丰度估算方法可以有效解决前一个问题,但处理误报的方法还没有得到充分研究。我们提出了一种声学空间捕获-再捕获(ASCR)方法,通过将物种身份作为一个潜在变量来处理假阳性。来自 ML 技术的个体级输出被视为随机变量,其分布取决于潜在身份。这就产生了一个混合模型似然,我们将其最大化以估计调用密度。通过将我们的方法应用于 ASCR 青蛙调查和基于真实长臂猿声学数据的模拟长臂猿声学调查,我们将其与现有方法进行了比较。与广泛使用的假阳性 "校正因子 "方法相比,我们的方法得出的估计值更接近于应用于数据集的无假阳性 ASCR 方法。模拟结果表明,我们的方法偏差接近于零,覆盖概率准确,在不考虑假阳性的情况下,其性能大大优于 ASCR。
{"title":"Towards automated animal density estimation with acoustic spatial capture-recapture.","authors":"Yuheng Wang, Juan Ye, Xiaohui Li, David L Borchers","doi":"10.1093/biomtc/ujae081","DOIUrl":"https://doi.org/10.1093/biomtc/ujae081","url":null,"abstract":"<p><p>Passive acoustic monitoring can be an effective way of monitoring wildlife populations that are acoustically active but difficult to survey visually, but identifying target species calls in recordings is non-trivial. Machine learning (ML) techniques can do detection quickly but may miss calls and produce false positives, i.e., misidentify calls from other sources as being from the target species. While abundance estimation methods can address the former issue effectively, methods to deal with false positives are under-investigated. We propose an acoustic spatial capture-recapture (ASCR) method that deals with false positives by treating species identity as a latent variable. Individual-level outputs from ML techniques are treated as random variables whose distributions depend on the latent identity. This gives rise to a mixture model likelihood that we maximize to estimate call density. We compare our method to existing methods by applying it to an ASCR survey of frogs and simulated acoustic surveys of gibbons based on real gibbon acoustic data. Estimates from our method are closer to ASCR applied to the dataset without false positives than those from a widely used false positive \"correction factor\" method. Simulations show our method to have bias close to zero and accurate coverage probabilities and to perform substantially better than ASCR without accounting for false positives.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142079070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing for similarity of multivariate mixed outcomes using generalized joint regression models with application to efficacy-toxicity responses. 利用广义联合回归模型测试多变量混合结果的相似性,并将其应用于疗效-毒性反应。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae077
Niklas Hagemann, Giampiero Marra, Frank Bretz, Kathrin Möllenhoff

A common problem in clinical trials is to test whether the effect of an explanatory variable on a response of interest is similar between two groups, for example, patient or treatment groups. In this regard, similarity is defined as equivalence up to a pre-specified threshold that denotes an acceptable deviation between the two groups. This issue is typically tackled by assessing if the explanatory variable's effect on the response is similar. This assessment is based on, for example, confidence intervals of differences or a suitable distance between two parametric regression models. Typically, these approaches build on the assumption of a univariate continuous or binary outcome variable. However, multivariate outcomes, especially beyond the case of bivariate binary responses, remain underexplored. This paper introduces an approach based on a generalized joint regression framework exploiting the Gaussian copula. Compared to existing methods, our approach accommodates various outcome variable scales, such as continuous, binary, categorical, and ordinal, including mixed outcomes in multi-dimensional spaces. We demonstrate the validity of this approach through a simulation study and an efficacy-toxicity case study, hence highlighting its practical relevance.

临床试验中的一个常见问题是测试解释变量对相关反应的影响在两组(如患者组或治疗组)之间是否相似。在这方面,相似性被定义为在预先指定的阈值内的等效性,该阈值表示两组之间可接受的偏差。这一问题通常通过评估解释变量对反应的影响是否相似来解决。例如,这种评估基于差异的置信区间或两个参数回归模型之间的适当距离。通常,这些方法都建立在单变量连续或二元结果变量的假设之上。然而,多变量结果,特别是二元二进制反应以外的情况,仍然没有得到充分的探讨。本文介绍了一种基于广义联合回归框架、利用高斯协方差的方法。与现有的方法相比,我们的方法适用于各种结果变量尺度,如连续、二元、分类和序数,包括多维空间中的混合结果。我们通过一项模拟研究和一项疗效-毒性案例研究证明了这种方法的有效性,从而突出了它的实用性。
{"title":"Testing for similarity of multivariate mixed outcomes using generalized joint regression models with application to efficacy-toxicity responses.","authors":"Niklas Hagemann, Giampiero Marra, Frank Bretz, Kathrin Möllenhoff","doi":"10.1093/biomtc/ujae077","DOIUrl":"https://doi.org/10.1093/biomtc/ujae077","url":null,"abstract":"<p><p>A common problem in clinical trials is to test whether the effect of an explanatory variable on a response of interest is similar between two groups, for example, patient or treatment groups. In this regard, similarity is defined as equivalence up to a pre-specified threshold that denotes an acceptable deviation between the two groups. This issue is typically tackled by assessing if the explanatory variable's effect on the response is similar. This assessment is based on, for example, confidence intervals of differences or a suitable distance between two parametric regression models. Typically, these approaches build on the assumption of a univariate continuous or binary outcome variable. However, multivariate outcomes, especially beyond the case of bivariate binary responses, remain underexplored. This paper introduces an approach based on a generalized joint regression framework exploiting the Gaussian copula. Compared to existing methods, our approach accommodates various outcome variable scales, such as continuous, binary, categorical, and ordinal, including mixed outcomes in multi-dimensional spaces. We demonstrate the validity of this approach through a simulation study and an efficacy-toxicity case study, hence highlighting its practical relevance.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142016283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Summary statistics knockoffs inference with family-wise error rate control. 利用族智误差率控制进行汇总统计山寨推理。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae082
Catherine Xinrui Yu, Jiaqi Gu, Zhaomeng Chen, Zihuai He

Testing multiple hypotheses of conditional independence with provable error rate control is a fundamental problem with various applications. To infer conditional independence with family-wise error rate (FWER) control when only summary statistics of marginal dependence are accessible, we adopt GhostKnockoff to directly generate knockoff copies of summary statistics and propose a new filter to select features conditionally dependent on the response. In addition, we develop a computationally efficient algorithm to greatly reduce the computational cost of knockoff copies generation without sacrificing power and FWER control. Experiments on simulated data and a real dataset of Alzheimer's disease genetics demonstrate the advantage of the proposed method over existing alternatives in both statistical power and computational efficiency.

在可证明的误差率控制下测试条件独立性的多重假设是一个具有多种应用的基本问题。为了在仅能获得边际依赖性汇总统计量的情况下推断条件独立性并控制族内误差率 (FWER),我们采用了 GhostKnockoff 方法来直接生成汇总统计量的山寨副本,并提出了一种新的过滤器来选择条件依赖于响应的特征。此外,我们还开发了一种计算高效的算法,在不牺牲功率和 FWER 控制的前提下,大大降低了生成山寨副本的计算成本。在模拟数据和阿尔茨海默病遗传学真实数据集上进行的实验表明,与现有的替代方法相比,所提出的方法在统计能力和计算效率方面都更具优势。
{"title":"Summary statistics knockoffs inference with family-wise error rate control.","authors":"Catherine Xinrui Yu, Jiaqi Gu, Zhaomeng Chen, Zihuai He","doi":"10.1093/biomtc/ujae082","DOIUrl":"10.1093/biomtc/ujae082","url":null,"abstract":"<p><p>Testing multiple hypotheses of conditional independence with provable error rate control is a fundamental problem with various applications. To infer conditional independence with family-wise error rate (FWER) control when only summary statistics of marginal dependence are accessible, we adopt GhostKnockoff to directly generate knockoff copies of summary statistics and propose a new filter to select features conditionally dependent on the response. In addition, we develop a computationally efficient algorithm to greatly reduce the computational cost of knockoff copies generation without sacrificing power and FWER control. Experiments on simulated data and a real dataset of Alzheimer's disease genetics demonstrate the advantage of the proposed method over existing alternatives in both statistical power and computational efficiency.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11367731/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142104014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semiparametric inference of effective reproduction number dynamics from wastewater pathogen surveillance data. 从废水病原体监测数据中推断有效繁殖数量动态的半参数。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae074
Isaac H Goldstein, Daniel M Parker, Sunny Jiang, Volodymyr M Minin

Concentrations of pathogen genomes measured in wastewater have recently become available as a new data source to use when modeling the spread of infectious diseases. One promising use for this data source is inference of the effective reproduction number, the average number of individuals a newly infected person will infect. We propose a model where new infections arrive according to a time-varying immigration rate which can be interpreted as an average number of secondary infections produced by one infectious individual per unit time. This model allows us to estimate the effective reproduction number from concentrations of pathogen genomes, while avoiding difficulty to verify assumptions about the dynamics of the susceptible population. As a byproduct of our primary goal, we also produce a new model for estimating the effective reproduction number from case data using the same framework. We test this modeling framework in an agent-based simulation study with a realistic data generating mechanism which accounts for the time-varying dynamics of pathogen shedding. Finally, we apply our new model to estimating the effective reproduction number of SARS-CoV-2, the causative agent of COVID-19, in Los Angeles, CA, using pathogen RNA concentrations collected from a large wastewater treatment facility.

最近,废水中测量到的病原体基因组浓度成为一种新的数据源,可用于模拟传染病的传播。这一数据源的一个很有前景的用途是推断有效繁殖数,即一个新感染者将感染的平均个体数。我们提出了一个新感染者根据随时间变化的移民率到达的模型,该移民率可解释为一个感染者在单位时间内产生的二次感染的平均数量。通过这一模型,我们可以从病原体基因组的浓度中估算出有效的繁殖数量,同时避免了验证易感人群动态假设的困难。作为主要目标的副产品,我们还利用相同的框架制作了一个新模型,用于从病例数据中估算有效繁殖数量。我们在一项基于代理的模拟研究中测试了这一建模框架,该研究采用了现实的数据生成机制,考虑了病原体脱落的时变动态。最后,我们利用从大型废水处理设施收集到的病原体 RNA 浓度,将新模型应用于估算 COVID-19 的病原体 SARS-CoV-2 在加利福尼亚州洛杉矶的有效繁殖数量。
{"title":"Semiparametric inference of effective reproduction number dynamics from wastewater pathogen surveillance data.","authors":"Isaac H Goldstein, Daniel M Parker, Sunny Jiang, Volodymyr M Minin","doi":"10.1093/biomtc/ujae074","DOIUrl":"10.1093/biomtc/ujae074","url":null,"abstract":"<p><p>Concentrations of pathogen genomes measured in wastewater have recently become available as a new data source to use when modeling the spread of infectious diseases. One promising use for this data source is inference of the effective reproduction number, the average number of individuals a newly infected person will infect. We propose a model where new infections arrive according to a time-varying immigration rate which can be interpreted as an average number of secondary infections produced by one infectious individual per unit time. This model allows us to estimate the effective reproduction number from concentrations of pathogen genomes, while avoiding difficulty to verify assumptions about the dynamics of the susceptible population. As a byproduct of our primary goal, we also produce a new model for estimating the effective reproduction number from case data using the same framework. We test this modeling framework in an agent-based simulation study with a realistic data generating mechanism which accounts for the time-varying dynamics of pathogen shedding. Finally, we apply our new model to estimating the effective reproduction number of SARS-CoV-2, the causative agent of COVID-19, in Los Angeles, CA, using pathogen RNA concentrations collected from a large wastewater treatment facility.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141896690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PathGPS: discover shared genetic architecture using GWAS summary data. PathGPS:利用 GWAS 摘要数据发现共享遗传结构。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae060
Zijun Gao, Qingyuan Zhao, Trevor Hastie

The increasing availability and scale of biobanks and "omic" datasets bring new horizons for understanding biological mechanisms. PathGPS is an exploratory data analysis tool to discover genetic architectures using Genome Wide Association Studies (GWAS) summary data. PathGPS is based on a linear structural equation model where traits are regulated by both genetic and environmental pathways. PathGPS decouples the genetic and environmental components by contrasting the GWAS associations of "signal" genes with those of "noise" genes. From the estimated genetic component, PathGPS then extracts genetic pathways via principal component and factor analysis, leveraging the low-rank and sparse properties. In addition, we provide a bootstrap aggregating ("bagging") algorithm to improve stability under data perturbation and hyperparameter tuning. When applied to a metabolomics dataset and the UK Biobank, PathGPS confirms several known gene-trait clusters and suggests multiple new hypotheses for future investigations.

生物库和 "omic "数据集的可用性和规模不断扩大,为了解生物机制带来了新的视野。PathGPS 是一种探索性数据分析工具,用于利用全基因组关联研究(GWAS)汇总数据发现遗传结构。PathGPS 基于线性结构方程模型,在该模型中,性状同时受遗传和环境途径的调节。PathGPS 通过对比 "信号 "基因与 "噪音 "基因在 GWAS 中的关联,将遗传和环境因素分离开来。然后,PathGPS 利用低秩和稀疏特性,通过主成分和因子分析,从估计的遗传成分中提取遗传途径。此外,我们还提供了一种自举聚合("bagging")算法,以提高数据扰动和超参数调整下的稳定性。当应用到代谢组学数据集和英国生物库时,PathGPS 证实了几个已知的基因性状群,并为未来的研究提出了多个新的假设。
{"title":"PathGPS: discover shared genetic architecture using GWAS summary data.","authors":"Zijun Gao, Qingyuan Zhao, Trevor Hastie","doi":"10.1093/biomtc/ujae060","DOIUrl":"10.1093/biomtc/ujae060","url":null,"abstract":"<p><p>The increasing availability and scale of biobanks and \"omic\" datasets bring new horizons for understanding biological mechanisms. PathGPS is an exploratory data analysis tool to discover genetic architectures using Genome Wide Association Studies (GWAS) summary data. PathGPS is based on a linear structural equation model where traits are regulated by both genetic and environmental pathways. PathGPS decouples the genetic and environmental components by contrasting the GWAS associations of \"signal\" genes with those of \"noise\" genes. From the estimated genetic component, PathGPS then extracts genetic pathways via principal component and factor analysis, leveraging the low-rank and sparse properties. In addition, we provide a bootstrap aggregating (\"bagging\") algorithm to improve stability under data perturbation and hyperparameter tuning. When applied to a metabolomics dataset and the UK Biobank, PathGPS confirms several known gene-trait clusters and suggests multiple new hypotheses for future investigations.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247175/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141615885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adjusting for incomplete baseline covariates in randomized controlled trials: a cross-world imputation framework. 调整随机对照试验中不完整的基线协变量:跨世界估算框架。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae094
Yilin Song, James P Hughes, Ting Ye

In randomized controlled trials, adjusting for baseline covariates is commonly used to improve the precision of treatment effect estimation. However, covariates often have missing values. Recently, Zhao and Ding studied two simple strategies, the single imputation method and missingness-indicator method (MIM), to handle missing covariates and showed that both methods can provide an efficiency gain compared to not adjusting for covariates. To better understand and compare these two strategies, we propose and investigate a novel theoretical imputation framework termed cross-world imputation (CWI). This framework includes both single imputation and MIM as special cases, facilitating the comparison of their efficiency. Through the lens of CWI, we show that MIM implicitly searches for the optimal CWI values and thus achieves optimal efficiency. We also derive conditions under which the single imputation method, by searching for the optimal single imputation values, can achieve the same efficiency as the MIM. We illustrate our findings through simulation studies and a real data analysis based on the Childhood Adenotonsillectomy Trial. We conclude by discussing the practical implications of our findings.

在随机对照试验中,调整基线协变量通常用于提高治疗效果估计的精确度。然而,协变量往往存在缺失值。最近,赵(Zhao)和丁(Ding)研究了处理缺失协变量的两种简单策略,即单一估算法和缺失指示器法(MIM),结果表明,与不调整协变量相比,这两种方法都能提高效率。为了更好地理解和比较这两种策略,我们提出并研究了一种新的理论估算框架,称为跨世界估算(CWI)。该框架将单一估算和 MIM 作为特例,便于比较它们的效率。通过 CWI 的视角,我们表明 MIM 会隐含地搜索最佳 CWI 值,从而实现最佳效率。我们还推导出了单一估算方法通过寻找最佳单一估算值而达到与 MIM 相同效率的条件。我们通过模拟研究和基于儿童腺样体切除术试验的真实数据分析来说明我们的发现。最后,我们将讨论我们的发现的实际意义。
{"title":"Adjusting for incomplete baseline covariates in randomized controlled trials: a cross-world imputation framework.","authors":"Yilin Song, James P Hughes, Ting Ye","doi":"10.1093/biomtc/ujae094","DOIUrl":"https://doi.org/10.1093/biomtc/ujae094","url":null,"abstract":"<p><p>In randomized controlled trials, adjusting for baseline covariates is commonly used to improve the precision of treatment effect estimation. However, covariates often have missing values. Recently, Zhao and Ding studied two simple strategies, the single imputation method and missingness-indicator method (MIM), to handle missing covariates and showed that both methods can provide an efficiency gain compared to not adjusting for covariates. To better understand and compare these two strategies, we propose and investigate a novel theoretical imputation framework termed cross-world imputation (CWI). This framework includes both single imputation and MIM as special cases, facilitating the comparison of their efficiency. Through the lens of CWI, we show that MIM implicitly searches for the optimal CWI values and thus achieves optimal efficiency. We also derive conditions under which the single imputation method, by searching for the optimal single imputation values, can achieve the same efficiency as the MIM. We illustrate our findings through simulation studies and a real data analysis based on the Childhood Adenotonsillectomy Trial. We conclude by discussing the practical implications of our findings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hypothesis tests in ordinal predictive models with optimal accuracy. 具有最佳准确性的序数预测模型中的假设检验。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae079
Yuyang Liu, Shan Luo, Jialiang Li

In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a substantial pool of potential predictors and achieving optimal HUM, it becomes imperative to conduct appropriate statistical inference. However, prevalent methodologies in existing literature are computationally expensive. We propose to use the jackknife empirical likelihood method to address this issue. The Wilks' theorem under moderate conditions is established and the power analysis under the Pitman alternative is provided. We also introduce a novel network-based rapid computation algorithm specifically designed for computing a general multi-sample $U$-statistic in our test procedure. To compare our approach against existing approaches, we conduct extensive simulations. Results demonstrate the superior performance of our method in terms of test size, power, and implementation time. Furthermore, we apply our method to analyze a real medical dataset and obtain some new findings.

在涉及多类序数判别的实际应用中,一种常见的方法是将多个预测变量聚合成一个线性组合,从而开发出一种预测精度高的分类器。对这种多类分类器的评估通常使用 ROC 流形下的超体积(HUM)。在处理大量潜在预测因子并实现最佳 HUM 时,必须进行适当的统计推断。然而,现有文献中普遍采用的方法计算成本高昂。我们建议使用杰克刀经验似然法(jackknife empirical likelihood method)来解决这一问题。我们建立了温和条件下的 Wilks' 定理,并提供了 Pitman 备选方案下的幂次分析。我们还引入了一种基于网络的新型快速计算算法,专门用于计算测试程序中的一般多样本 U$ 统计量。为了将我们的方法与现有方法进行比较,我们进行了大量模拟。结果表明,我们的方法在测试规模、功率和实施时间方面都具有卓越的性能。此外,我们还应用我们的方法分析了一个真实的医疗数据集,并获得了一些新的发现。
{"title":"Hypothesis tests in ordinal predictive models with optimal accuracy.","authors":"Yuyang Liu, Shan Luo, Jialiang Li","doi":"10.1093/biomtc/ujae079","DOIUrl":"https://doi.org/10.1093/biomtc/ujae079","url":null,"abstract":"<p><p>In real-world applications involving multi-class ordinal discrimination, a common approach is to aggregate multiple predictive variables into a linear combination, aiming to develop a classifier with high prediction accuracy. Assessment of such multi-class classifiers often utilizes the hypervolume under ROC manifolds (HUM). When dealing with a substantial pool of potential predictors and achieving optimal HUM, it becomes imperative to conduct appropriate statistical inference. However, prevalent methodologies in existing literature are computationally expensive. We propose to use the jackknife empirical likelihood method to address this issue. The Wilks' theorem under moderate conditions is established and the power analysis under the Pitman alternative is provided. We also introduce a novel network-based rapid computation algorithm specifically designed for computing a general multi-sample $U$-statistic in our test procedure. To compare our approach against existing approaches, we conduct extensive simulations. Results demonstrate the superior performance of our method in terms of test size, power, and implementation time. Furthermore, we apply our method to analyze a real medical dataset and obtain some new findings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142016282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional multivariate analysis of variance via geometric median and bootstrapping. 通过几何中值和引导进行高维多变量方差分析。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae088
Guanghui Cheng, Ruitao Lin, Liuhua Peng

The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset.

几何中值适用于高维数据,可视为用于一维数据的单变量中值的一般化。它可以作为一种稳健的估计器来识别多维数据的位置,在现实世界中有着广泛的应用。本文探讨了使用几何中值进行高维多变量方差分析(MANOVA)的问题。本文介绍了一种依赖于各组间几何中值差异的最大值型统计量。利用高斯近似法得出了新检验统计量在零假设下的分布,并确定了其在备择假设下的一致性。为了逼近新统计量在高维度下的分布,提出了一种野生引导算法,并从理论上证明了该算法的合理性。通过对各种维度、样本大小和数据生成模型进行模拟研究,我们证明了基于几何中值的 MANOVA 方法的有限样本性能。此外,我们还利用提出的方法分析了乳腺癌基因表达数据集。
{"title":"High-dimensional multivariate analysis of variance via geometric median and bootstrapping.","authors":"Guanghui Cheng, Ruitao Lin, Liuhua Peng","doi":"10.1093/biomtc/ujae088","DOIUrl":"10.1093/biomtc/ujae088","url":null,"abstract":"<p><p>The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11381952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142153109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discussion on "LEAP: the latent exchangeability prior for borrowing information from historical data" by Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, and Joseph G. Ibrahim. 关于 Ethan M. Alt、Xiuya Chang、Xun Jiang、Qing Liu、May Mo、H. Amy Xia 和 Joseph G. Ibrahim 所著《LEAP:从历史数据中借用信息的潜在可交换性先验》的讨论。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae086
Shannon D Thomas, Alexander M Kaizer

This discussion provides commentary on the paper by Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, and Joseph G. Ibrahim entitled "LEAP: the latent exchangeability prior for borrowing information from historical data". The authors propose a novel method to bridge the incorporation of supplemental information into a study while also identifying potentially exchangeable subgroups to better facilitate information sharing. In this discussion, we highlight the potential relationship with other Bayesian model averaging approaches, such as multisource exchangeability modeling, and provide a brief numeric case study to illustrate how the concepts behind latent exchangeability prior may also improve the performance of existing methods. The results provided by Alt et al. are exciting, and we believe that the method represents a meaningful approach to more efficient information sharing.

本讨论对 Ethan M. Alt、Xiuya Chang、Xun Jiang、Qing Liu、May Mo、H. Amy Xia 和 Joseph G. Ibrahim 题为 "LEAP:从历史数据中借用信息的潜在可交换性先验 "的论文进行了评论。作者提出了一种新方法,在将补充信息纳入研究的同时,还能识别潜在的可交换子群,从而更好地促进信息共享。在讨论中,我们强调了与其他贝叶斯模型平均方法(如多源可交换性建模)的潜在关系,并提供了一个简短的数字案例研究,以说明潜在可交换性先验背后的概念如何也能提高现有方法的性能。Alt 等人提供的结果令人振奋,我们相信该方法是实现更高效信息共享的一种有意义的方法。
{"title":"Discussion on \"LEAP: the latent exchangeability prior for borrowing information from historical data\" by Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, and Joseph G. Ibrahim.","authors":"Shannon D Thomas, Alexander M Kaizer","doi":"10.1093/biomtc/ujae086","DOIUrl":"10.1093/biomtc/ujae086","url":null,"abstract":"<p><p>This discussion provides commentary on the paper by Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, and Joseph G. Ibrahim entitled \"LEAP: the latent exchangeability prior for borrowing information from historical data\". The authors propose a novel method to bridge the incorporation of supplemental information into a study while also identifying potentially exchangeable subgroups to better facilitate information sharing. In this discussion, we highlight the potential relationship with other Bayesian model averaging approaches, such as multisource exchangeability modeling, and provide a brief numeric case study to illustrate how the concepts behind latent exchangeability prior may also improve the performance of existing methods. The results provided by Alt et al. are exciting, and we believe that the method represents a meaningful approach to more efficient information sharing.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11427888/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142340681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian nonparametric approach for causal mediation with a post-treatment confounder. 贝叶斯非参数方法,用于处理后混杂因素的因果中介。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae099
Woojung Bae, Michael J Daniels, Michael G Perri

We propose a new Bayesian nonparametric method for estimating the causal effects of mediation in the presence of a post-treatment confounder. The methodology is motivated by the Rural Lifestyle Intervention Treatment Effectiveness Trial (Rural LITE) for which there is interest in estimating causal mediation effects but is complicated by the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounder, treatment, and baseline confounders). For identifiability, we use the extended version of the standard sequential ignorability (SI) as introduced in Hong et al. along with a Gaussian copula model assumption. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, that is, the natural direct effects (NDE) and natural indirect effects (NIE). Our method enables easy computation of NIE and NDE for a subset of confounding variables and addresses missing data through data augmentation under the assumption of ignorable missingness. We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, finding that there was not strong evidence for the potential mediator.

我们提出了一种新的贝叶斯非参数方法,用于在存在治疗后混杂因素的情况下估计中介的因果效应。该方法受农村生活方式干预治疗效果试验(Rural Lifestyle Intervention Treatment Effectiveness Trial,RITE)的启发,该试验对因果中介效应的估计很感兴趣,但由于存在治疗后混杂因素而变得复杂。我们指定了一个丰富的 Dirichlet 过程混合物(EDPM)来模拟观察数据(结果、中介因素、治疗后混杂因素、治疗和基线混杂因素)的联合分布。在可识别性方面,我们使用了 Hong 等人引入的标准序列无知(SI)的扩展版本,以及高斯共轭模型假设。观察数据模型和因果识别假设使我们能够估计和识别中介的因果效应,即自然直接效应(NDE)和自然间接效应(NIE)。我们的方法可以轻松计算混杂变量子集的自然直接效应(NIE)和自然间接效应(NDE),并在可忽略缺失的假设下通过数据扩增解决缺失数据问题。我们进行了模拟研究,以评估我们提出的方法的性能。此外,我们还应用这种方法评估了农村 LITE 试验中的因果中介效应,发现并没有强有力的证据证明潜在的中介效应。
{"title":"A Bayesian nonparametric approach for causal mediation with a post-treatment confounder.","authors":"Woojung Bae, Michael J Daniels, Michael G Perri","doi":"10.1093/biomtc/ujae099","DOIUrl":"10.1093/biomtc/ujae099","url":null,"abstract":"<p><p>We propose a new Bayesian nonparametric method for estimating the causal effects of mediation in the presence of a post-treatment confounder. The methodology is motivated by the Rural Lifestyle Intervention Treatment Effectiveness Trial (Rural LITE) for which there is interest in estimating causal mediation effects but is complicated by the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounder, treatment, and baseline confounders). For identifiability, we use the extended version of the standard sequential ignorability (SI) as introduced in Hong et al. along with a Gaussian copula model assumption. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, that is, the natural direct effects (NDE) and natural indirect effects (NIE). Our method enables easy computation of NIE and NDE for a subset of confounding variables and addresses missing data through data augmentation under the assumption of ignorable missingness. We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, finding that there was not strong evidence for the potential mediator.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11418020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1