首页 > 最新文献

The New England Journal of Statistics in Data Science最新文献

英文 中文
Algorithm-Based Optimal and Efficient Exact Experimental Designs for Crossover and Interference Models 基于算法的交叉与干扰模型的最优高效精确实验设计
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds41
S. Hao, Min Yang, Weiwei Zheng
The crossover models and interference models are frequently used in clinical trials, agriculture studies, social studies, etc. While some theoretical optimality results are available, it is still challenging to apply these results in practice. The available theoretical results, due to the complexity of exact optimal designs, typically require some specific combinations of the number of treatments (t), periods (p), and subjects (n). A more flexible method is to build integer programming based on theories in approximate design theory, which can handle general cases of $(t,p,n)$. Nonetheless, those results are generally derived for specific models or design problems and new efforts are needed for new problems. These obstacles make the application of the theoretical results rather difficult. Here we propose a new algorithm, a revision of the optimal weight exchange algorithm by [1]. It provides efficient crossover designs quickly under various situations, for different optimality criteria, different parameters of interest, different configurations of $(t,p,n)$, as well as arbitrary dropout scenarios. To facilitate the usage of our algorithm, the corresponding R package and an R Shiny app as a more user-friendly interface has been developed.
交叉模型和干扰模型常用于临床试验、农业研究、社会研究等领域。虽然一些理论上的最优性结果是可用的,但在实践中应用这些结果仍然具有挑战性。由于精确优化设计的复杂性,现有的理论结果通常需要一些特定的处理次数(t),周期(p)和主题(n)的组合。更灵活的方法是基于近似设计理论中的理论构建整数规划,可以处理$(t,p,n)$的一般情况。尽管如此,这些结果通常是针对特定的模型或设计问题而得出的,并且需要为新的问题做出新的努力。这些障碍使得理论结果的应用相当困难。本文提出了一种新的算法,对最优权值交换算法进行了修正[1]。它在各种情况下,针对不同的最优性准则、不同的感兴趣参数、不同的$(t,p,n)$配置以及任意退出场景,快速提供高效的交叉设计。为了方便我们的算法的使用,我们开发了相应的R包和一个R Shiny应用程序,作为一个更友好的用户界面。
{"title":"Algorithm-Based Optimal and Efficient Exact Experimental Designs for Crossover and Interference Models","authors":"S. Hao, Min Yang, Weiwei Zheng","doi":"10.51387/23-nejsds41","DOIUrl":"https://doi.org/10.51387/23-nejsds41","url":null,"abstract":"The crossover models and interference models are frequently used in clinical trials, agriculture studies, social studies, etc. While some theoretical optimality results are available, it is still challenging to apply these results in practice. The available theoretical results, due to the complexity of exact optimal designs, typically require some specific combinations of the number of treatments (t), periods (p), and subjects (n). A more flexible method is to build integer programming based on theories in approximate design theory, which can handle general cases of $(t,p,n)$. Nonetheless, those results are generally derived for specific models or design problems and new efforts are needed for new problems. These obstacles make the application of the theoretical results rather difficult. Here we propose a new algorithm, a revision of the optimal weight exchange algorithm by [1]. It provides efficient crossover designs quickly under various situations, for different optimality criteria, different parameters of interest, different configurations of $(t,p,n)$, as well as arbitrary dropout scenarios. To facilitate the usage of our algorithm, the corresponding R package and an R Shiny app as a more user-friendly interface has been developed.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87236279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Subdata Selection With a Large Number of Variables 具有大量变量的子数据选择
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds36
Rakhi Singh, J. Stufken
Subdata selection from big data is an active area of research that facilitates inferences based on big data with limited computational expense. For linear regression models, the optimal design-inspired Information-Based Optimal Subdata Selection (IBOSS) method is a computationally efficient method for selecting subdata that has excellent statistical properties. But the method can only be used if the subdata size, k, is at last twice the number of regression variables, p. In addition, even when $kge 2p$, under the assumption of effect sparsity, one can expect to obtain subdata with better statistical properties by trying to focus on active variables. Inspired by recent efforts to extend the IBOSS method to situations with a large number of variables p, we introduce a method called Combining Lasso And Subdata Selection (CLASS) that, as shown, improves on other proposed methods in terms of variable selection and building a predictive model based on subdata when the full data size n is very large and the number of variables p is large. In terms of computational expense, CLASS is more expensive than recent competitors for moderately large values of n, but the roles reverse under effect sparsity for extremely large values of n.
从大数据中选择子数据是一个活跃的研究领域,它可以在有限的计算成本下促进基于大数据的推断。对于线性回归模型,基于优化设计的信息优化子数据选择(Information-Based optimal Subdata Selection, IBOSS)方法是一种计算效率很高的方法,用于选择具有良好统计特性的子数据。但是,只有当子数据大小k至少是回归变量数量p的两倍时,才可以使用该方法。此外,即使在效应稀疏性假设下,也可以期望通过尝试关注活动变量来获得具有更好统计性质的子数据。受最近将IBOSS方法扩展到具有大量变量p的情况的努力的启发,我们引入了一种称为结合Lasso和子数据选择(CLASS)的方法,如图所示,该方法在变量选择和基于子数据构建预测模型方面改进了其他提出的方法,当完整数据大小n非常大且变量数量p很大时。就计算费用而言,对于中等较大的n值,CLASS比最近的竞争对手更昂贵,但是对于极大的n值,在效果稀疏性下,角色颠倒了。
{"title":"Subdata Selection With a Large Number of Variables","authors":"Rakhi Singh, J. Stufken","doi":"10.51387/23-nejsds36","DOIUrl":"https://doi.org/10.51387/23-nejsds36","url":null,"abstract":"Subdata selection from big data is an active area of research that facilitates inferences based on big data with limited computational expense. For linear regression models, the optimal design-inspired Information-Based Optimal Subdata Selection (IBOSS) method is a computationally efficient method for selecting subdata that has excellent statistical properties. But the method can only be used if the subdata size, k, is at last twice the number of regression variables, p. In addition, even when $kge 2p$, under the assumption of effect sparsity, one can expect to obtain subdata with better statistical properties by trying to focus on active variables. Inspired by recent efforts to extend the IBOSS method to situations with a large number of variables p, we introduce a method called Combining Lasso And Subdata Selection (CLASS) that, as shown, improves on other proposed methods in terms of variable selection and building a predictive model based on subdata when the full data size n is very large and the number of variables p is large. In terms of computational expense, CLASS is more expensive than recent competitors for moderately large values of n, but the roles reverse under effect sparsity for extremely large values of n.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83546858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Simultaneous False-Decision Error Rates in Master Protocols with Shared Control: False Discovery Rate Perspective 具有共享控制的主协议中的同时错误决策错误率:错误发现率的观点
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds28
Jingjing Ye, X. Li, Cheng Lu, William Wang
Master protocol is a type of trial designs where multiple therapies and/or multiple disease populations can be investigated in the same trial. A shared control can be used for multiple therapies to gain operational efficiency and gain attraction to patients. To balance between controlling for false positive rate and having adequate power for detecting true signals, the impact of False Discovery Rate (FDR) is evaluated when multiple investigational drugs are studied in the master protocol. With the shared control group, the “random high” or “random low” in the control group can potentially impact all hypotheses testing that compare each of the test regimens and the control group in terms of probability of having at least one positive hypothesis outcome, or multiple positive outcomes. When regulatory agencies make the decision of approving or declining one or more regimens based on the master protocol design, this introduces a different type of error: simultaneous false-decision error. In this manuscript, we examine in detail the derivations and properties of the simultaneous false-decision error in the master protocol with shared control under the framework of FDR. The simultaneous false-decision error consists of two parts: simultaneous false-discovery rate (SFDR) and simultaneous false non-discovery rate (SFNR). Based on our analytical evaluation and simulations, the magnitude of SFDR and SFNR inflation is small. Therefore, the multiple error rate controls are generally adequate, further adjustment to a pre-specified level on SFDR or SFNR or reduce the alpha allocated to each individual treatment comparison to the shared control is deemed unnecessary.
主方案是一种试验设计,在同一试验中可以研究多种疗法和/或多种疾病人群。共享控制可以用于多种治疗,以提高操作效率,并获得对患者的吸引力。为了在控制假阳性率和具有足够的检测真信号的能力之间取得平衡,在主方案中研究多种研究药物时,评估假发现率(FDR)的影响。在共享的对照组中,对照组中的“随机高”或“随机低”可能会影响所有的假设测试,这些假设测试是比较每个测试方案和对照组至少有一个积极假设结果或多个积极结果的概率。当监管机构根据主协议设计决定批准或拒绝一个或多个方案时,这引入了另一种类型的错误:同时错误决策错误。在本文中,我们详细研究了在FDR框架下具有共享控制的主协议中同时假决策误差的推导和性质。同时错误决策误差由两部分组成:同时错误发现率(SFDR)和同时错误不发现率(SFNR)。根据我们的分析评估和模拟,SFDR和SFNR膨胀的幅度很小。因此,多个错误率控制通常是足够的,进一步调整到SFDR或SFNR的预先指定水平或减少分配给每个单独处理的alpha与共享控制相比是不必要的。
{"title":"Simultaneous False-Decision Error Rates in Master Protocols with Shared Control: False Discovery Rate Perspective","authors":"Jingjing Ye, X. Li, Cheng Lu, William Wang","doi":"10.51387/23-nejsds28","DOIUrl":"https://doi.org/10.51387/23-nejsds28","url":null,"abstract":"Master protocol is a type of trial designs where multiple therapies and/or multiple disease populations can be investigated in the same trial. A shared control can be used for multiple therapies to gain operational efficiency and gain attraction to patients. To balance between controlling for false positive rate and having adequate power for detecting true signals, the impact of False Discovery Rate (FDR) is evaluated when multiple investigational drugs are studied in the master protocol. With the shared control group, the “random high” or “random low” in the control group can potentially impact all hypotheses testing that compare each of the test regimens and the control group in terms of probability of having at least one positive hypothesis outcome, or multiple positive outcomes. When regulatory agencies make the decision of approving or declining one or more regimens based on the master protocol design, this introduces a different type of error: simultaneous false-decision error. In this manuscript, we examine in detail the derivations and properties of the simultaneous false-decision error in the master protocol with shared control under the framework of FDR. The simultaneous false-decision error consists of two parts: simultaneous false-discovery rate (SFDR) and simultaneous false non-discovery rate (SFNR). Based on our analytical evaluation and simulations, the magnitude of SFDR and SFNR inflation is small. Therefore, the multiple error rate controls are generally adequate, further adjustment to a pre-specified level on SFDR or SFNR or reduce the alpha allocated to each individual treatment comparison to the shared control is deemed unnecessary.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78815371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Estimation in Finite Mixture of Accelerated Failure Time and Mixture of Regression Models with R Package fmrs 加速失效时间有限混合与R包fmrs混合回归模型稀疏估计
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds49
Farhad Shokoohi
Variable selection in large-dimensional data has been extensively studied in different settings over the past decades. In a recent article, Shokoohi et. al. [29, DOI:10.1214/18-AOAS1198] proposed a method for variable selection in finite mixture of accelerated failure time regression models for studies on time-to-event data to capture heterogeneity within the population and account for censoring. In this paper, we introduce the fmrs package, which implements the variable selection methodology for such models. Furthermore, as a byproduct, the fmrs package facilitates variable selection in finite mixture regression models. The package also incorporates a tuning parameter selection mechanism based on component-wise bic. Commonly used penalties, such as Least Absolute Shrinkage and Selection Operator, and Smoothly Clipped Absolute Deviation, are integrated into fmrs. Additionally, the package offers an option for non-mixture regression models. The C language is chosen to boost the optimization speed. We provide an overview of the fmrs principles and the strategies employed for optimization. Hands-on illustrations are presented to help users get acquainted with fmrs. Finally, we apply fmrs to a lung cancer dataset and observe that a two-component mixture model reveals a subgroup with a more aggressive form of the disease, displaying a lower survival time.
在过去的几十年里,人们在不同的环境下对大维度数据中的变量选择进行了广泛的研究。在最近的一篇文章中,Shokoohi等人[29,DOI:10.1214/18-AOAS1198]提出了一种在有限混合加速失效时间回归模型中进行变量选择的方法,用于研究时间到事件数据,以捕获种群内的异质性并考虑审查。在本文中,我们引入了fmrs包,它实现了这些模型的变量选择方法。此外,作为副产品,fmrs包有助于有限混合回归模型中的变量选择。该包还集成了一个基于组件的调优参数选择机制。常用的惩罚,如最小绝对收缩和选择算子,以及平滑剪裁的绝对偏差,被整合到fmrs中。此外,该软件包还提供了非混合回归模型的选项。为了提高优化速度,选择了C语言。我们提供了fmrs原理和优化策略的概述。动手的插图提出,以帮助用户熟悉fmrs。最后,我们将fmrs应用于肺癌数据集,并观察到双组分混合模型揭示了具有更强侵袭性疾病形式的亚组,显示出较低的生存时间。
{"title":"Sparse Estimation in Finite Mixture of Accelerated Failure Time and Mixture of Regression Models with R Package fmrs","authors":"Farhad Shokoohi","doi":"10.51387/23-nejsds49","DOIUrl":"https://doi.org/10.51387/23-nejsds49","url":null,"abstract":"Variable selection in large-dimensional data has been extensively studied in different settings over the past decades. In a recent article, Shokoohi et. al. [29, DOI:10.1214/18-AOAS1198] proposed a method for variable selection in finite mixture of accelerated failure time regression models for studies on time-to-event data to capture heterogeneity within the population and account for censoring. In this paper, we introduce the fmrs package, which implements the variable selection methodology for such models. Furthermore, as a byproduct, the fmrs package facilitates variable selection in finite mixture regression models. The package also incorporates a tuning parameter selection mechanism based on component-wise bic. Commonly used penalties, such as Least Absolute Shrinkage and Selection Operator, and Smoothly Clipped Absolute Deviation, are integrated into fmrs. Additionally, the package offers an option for non-mixture regression models. The C language is chosen to boost the optimization speed. We provide an overview of the fmrs principles and the strategies employed for optimization. Hands-on illustrations are presented to help users get acquainted with fmrs. Finally, we apply fmrs to a lung cancer dataset and observe that a two-component mixture model reveals a subgroup with a more aggressive form of the disease, displaying a lower survival time.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135106538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of Supersaturated Designs with Small Coherence for Variable Selection 小相干变量选择过饱和设计的构造
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds34
Youran Qi, Peter Chien
The supersaturated design is often used to discover important factors in an experiment with a large number of factors and a small number of runs. We propose a method for constructing supersaturated designs with small coherence. Such designs are useful for variable selection methods such as the Lasso. Examples are provided to illustrate the proposed method.
过饱和设计常用于在因素多、运行次数少的实验中发现重要因素。我们提出了一种构造小相干度过饱和设计的方法。这样的设计是有用的变量选择方法,如套索。给出了实例来说明所提出的方法。
{"title":"Construction of Supersaturated Designs with Small Coherence for Variable Selection","authors":"Youran Qi, Peter Chien","doi":"10.51387/23-nejsds34","DOIUrl":"https://doi.org/10.51387/23-nejsds34","url":null,"abstract":"The supersaturated design is often used to discover important factors in an experiment with a large number of factors and a small number of runs. We propose a method for constructing supersaturated designs with small coherence. Such designs are useful for variable selection methods such as the Lasso. Examples are provided to illustrate the proposed method.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72865277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Nature-inspired Metaheuristics for finding Optimal Designs for the Continuation-Ratio Models 寻找连续比模型最优设计的自然启发元启发式方法
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds44
Jiaheng Qiu, W. Wong
The continuation-ratio (CR) model is frequently used in dose response studies to model a three-category outcome as the dose levels vary. Design issues for a CR model defined on an unrestricted dose interval have been discussed for estimating model parameters or a selected function of the model parameters. This paper uses metaheuristics to address design issues for a CR model defined on any compact dose interval when there are one or more objectives in the study and some are more important than others. Specifically, we use an exemplary nature-inspired metaheuristic algorithm called particle swarm optimization (PSO) to find locally optimal designs for estimating a few interesting functions of the model parameters, such as the most effective dose ($MED$), the maximum tolerated dose ($MTD$) and for estimating all parameters in a CR model. We demonstrate that PSO can efficiently find locally multiple-objective optimal designs for a CR model on various dose intervals and a small simulation study shows it tends to outperform the popular deterministic cocktail algorithm (CA) and another competitive metaheuristic algorithm called differential evolutionary (DE). We also discuss hybrid algorithms and their flexible applications to design early Phase 2 trials or tackle biomedical problems, such as different strategies for handling the recent pandemic.
在剂量反应研究中,持续比(CR)模型经常用于模拟随剂量水平变化的三类结果。讨论了在不受限制的剂量间隔上定义的CR模型的设计问题,以估计模型参数或模型参数的选定函数。当研究中有一个或多个目标,并且一些目标比其他目标更重要时,本文使用元启发式方法来解决在任何紧凑剂量间隔上定义的CR模型的设计问题。具体来说,我们使用了一种典型的自然启发的元启发式算法,称为粒子群优化(PSO)来寻找局部最优设计,以估计模型参数的一些有趣函数,如最有效剂量(MED$),最大耐受剂量(MTD$)和估计CR模型中的所有参数。我们证明了粒子群算法可以有效地为不同剂量间隔的CR模型找到局部多目标最优设计,并且小型模拟研究表明,它倾向于优于流行的确定性鸡尾酒算法(CA)和另一种称为差分进化(DE)的竞争性元启发式算法。我们还讨论了混合算法及其在设计早期第二阶段试验或解决生物医学问题(例如处理最近大流行的不同策略)方面的灵活应用。
{"title":"Nature-inspired Metaheuristics for finding Optimal Designs for the Continuation-Ratio Models","authors":"Jiaheng Qiu, W. Wong","doi":"10.51387/23-nejsds44","DOIUrl":"https://doi.org/10.51387/23-nejsds44","url":null,"abstract":"The continuation-ratio (CR) model is frequently used in dose response studies to model a three-category outcome as the dose levels vary. Design issues for a CR model defined on an unrestricted dose interval have been discussed for estimating model parameters or a selected function of the model parameters. This paper uses metaheuristics to address design issues for a CR model defined on any compact dose interval when there are one or more objectives in the study and some are more important than others. Specifically, we use an exemplary nature-inspired metaheuristic algorithm called particle swarm optimization (PSO) to find locally optimal designs for estimating a few interesting functions of the model parameters, such as the most effective dose ($MED$), the maximum tolerated dose ($MTD$) and for estimating all parameters in a CR model. We demonstrate that PSO can efficiently find locally multiple-objective optimal designs for a CR model on various dose intervals and a small simulation study shows it tends to outperform the popular deterministic cocktail algorithm (CA) and another competitive metaheuristic algorithm called differential evolutionary (DE). We also discuss hybrid algorithms and their flexible applications to design early Phase 2 trials or tackle biomedical problems, such as different strategies for handling the recent pandemic.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88761524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discussion of: Four Types of Frequentism and Their Interplay with Bayesianism, by J. Berger 讨论:四种类型的频率主义及其与贝叶斯主义的相互作用,J. Berger
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds4c
Judith Rousseau
{"title":"Discussion of: Four Types of Frequentism and Their Interplay with Bayesianism, by J. Berger","authors":"Judith Rousseau","doi":"10.51387/23-nejsds4c","DOIUrl":"https://doi.org/10.51387/23-nejsds4c","url":null,"abstract":"","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"224 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135784049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Indeterminate Data and Handling for Assessing Diagnostic Performance in Imaging Drug Developments 成像药物开发中评估诊断性能的不确定数据和处理
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds46
Sue-Jane Wang
In diagnostic imaging drug developments, the imaging scan read data in controlled imaging drug clinical trials includes test positive and test negative. Broadly speaking, the standard of reference data are either presence or absence of a disease or clinical condition. Together, these data are used to assess the diagnostic performance of an investigational imaging drug in a controlled imaging drug clinical trial. For those imaging scan read data that cannot be called positive/negative, the “indeterminate” category is commonly used to cover imaging results that may be considered intermediate, indeterminate, or uninterpretable. Similarly, for those standard of reference data that cannot be categorized into presence/absence including uncollected or unavailable reference standard data, the “indeterminate” category may be used. Historically, little attention has been paid to the indeterminate imaging scan read data as they are generally rare or considered irrelevant though they are related to scanned subjects and can be informative. Subjects lack the standard of reference are simply excluded as such the study only reports the analysis results in subjects with available standard of reference data, known as completer analysis, similar to evaluable subjects seen in controlled trials for drug developments. To improve diagnostic clinical trial planning, this paper introduces five attributes of an estimand in diagnostic imaging drug clinical trials. The paper then defines the indeterminate data mechanisms and gives examples for each indeterminate mechanism that is specific to the clinical context of a diagnostic imaging drug clinical trial. Several imputation approaches to handling indeterminate data are discussed. Depending on the clinical question of primary interests, indeterminate data may be intercurrent events. The paper ends with discussions on imputations of intercurrent events occurring in indeterminate imaging scan read data and those occurring in indeterminate standard of reference data when encountered in diagnostic imaging clinical trials and provides points to consider of estimands for diagnostic imaging drug developments.
在诊断成像药物开发中,对照成像药物临床试验中的成像扫描读取数据包括检测阳性和检测阴性。一般来说,参考数据的标准是存在或不存在某种疾病或临床状况。总之,这些数据被用来评估在对照成像药物临床试验中的研究成像药物的诊断性能。对于那些不能被称为阳性/阴性的成像扫描读取数据,“不确定”类别通常用于涵盖可能被认为是中间、不确定或不可解释的成像结果。同样,对于那些不能分类为存在/不存在的参考标准数据,包括未收集或不可用的参考标准数据,可以使用“不确定”类别。从历史上看,不确定的成像扫描读取数据很少受到关注,因为它们通常是罕见的或被认为是无关的,尽管它们与被扫描对象相关并且可以提供信息。缺乏参考标准的受试者被简单地排除在外,因此该研究仅报告具有可用参考标准数据的受试者的分析结果,称为完整分析,类似于药物开发对照试验中看到的可评估受试者。为了完善诊断性临床试验计划,本文介绍了诊断性影像学药物临床试验中估计量的5个属性。然后,本文定义了不确定的数据机制,并给出了特定于诊断成像药物临床试验临床背景的每个不确定机制的示例。讨论了处理不确定数据的几种归算方法。根据主要利益的临床问题,不确定的数据可能是并发事件。本文最后讨论了在诊断成像临床试验中遇到的不确定的成像扫描读取数据和不确定的参考数据标准中发生的交互事件的归算,并提供了诊断成像药物开发估计的考虑要点。
{"title":"Indeterminate Data and Handling for Assessing Diagnostic Performance in Imaging Drug Developments","authors":"Sue-Jane Wang","doi":"10.51387/23-nejsds46","DOIUrl":"https://doi.org/10.51387/23-nejsds46","url":null,"abstract":"In diagnostic imaging drug developments, the imaging scan read data in controlled imaging drug clinical trials includes test positive and test negative. Broadly speaking, the standard of reference data are either presence or absence of a disease or clinical condition. Together, these data are used to assess the diagnostic performance of an investigational imaging drug in a controlled imaging drug clinical trial. For those imaging scan read data that cannot be called positive/negative, the “indeterminate” category is commonly used to cover imaging results that may be considered intermediate, indeterminate, or uninterpretable. Similarly, for those standard of reference data that cannot be categorized into presence/absence including uncollected or unavailable reference standard data, the “indeterminate” category may be used. Historically, little attention has been paid to the indeterminate imaging scan read data as they are generally rare or considered irrelevant though they are related to scanned subjects and can be informative. Subjects lack the standard of reference are simply excluded as such the study only reports the analysis results in subjects with available standard of reference data, known as completer analysis, similar to evaluable subjects seen in controlled trials for drug developments. To improve diagnostic clinical trial planning, this paper introduces five attributes of an estimand in diagnostic imaging drug clinical trials. The paper then defines the indeterminate data mechanisms and gives examples for each indeterminate mechanism that is specific to the clinical context of a diagnostic imaging drug clinical trial. Several imputation approaches to handling indeterminate data are discussed. Depending on the clinical question of primary interests, indeterminate data may be intercurrent events. The paper ends with discussions on imputations of intercurrent events occurring in indeterminate imaging scan read data and those occurring in indeterminate standard of reference data when encountered in diagnostic imaging clinical trials and provides points to consider of estimands for diagnostic imaging drug developments.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74619025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
General Additive Network Effect Models 一般可加性网络效应模型
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds29
Trang Bui, Stefan H. Steiner, Nathaniel T. Stevens
In the interest of business innovation, social network companies often carry out experiments to test product changes and new ideas. In such experiments, users are typically assigned to one of two experimental conditions with some outcome of interest observed and compared. In this setting, the outcome of one user may be influenced by not only the condition to which they are assigned but also the conditions of other users via their network connections. This challenges classical experimental design and analysis methodologies and requires specialized methods. We introduce the general additive network effect (GANE) model, which encompasses many existing outcome models in the literature under a unified model-based framework. The model is both interpretable and flexible in modeling the treatment effect as well as the network influence. We show that (quasi) maximum likelihood estimators are consistent and asymptotically normal for a family of model specifications. Quantities of interest such as the global treatment effect are defined and expressed as functions of the GANE model parameters, and hence inference can be carried out using likelihood theory. We further propose the “power-degree” (POW-DEG) specification of the GANE model. The performance of POW-DEG and other specifications of the GANE model are investigated via simulations. Under model misspecification, the POW-DEG specification appears to work well. Finally, we study the characteristics of good experimental designs for the POW-DEG specification. We find that graph-cluster randomization and balanced designs are not necessarily optimal for precise estimation of the global treatment effect, indicating the need for alternative design strategies.
出于商业创新的兴趣,社交网络公司经常进行实验,以测试产品的变化和新的想法。在这样的实验中,用户通常被分配到两个实验条件之一,观察和比较一些感兴趣的结果。在这种情况下,一个用户的结果不仅会受到分配给他的条件的影响,还会受到通过网络连接的其他用户的条件的影响。这挑战了经典的实验设计和分析方法,需要专门的方法。我们引入了一般的可加性网络效应(GANE)模型,该模型在一个统一的基于模型的框架下涵盖了文献中许多现有的结果模型。该模型在模拟治疗效果和网络影响方面具有可解释性和灵活性。我们证明了(拟)极大似然估计量对于一组模型规格是一致的和渐近正态的。如整体治疗效果等感兴趣的量被定义并表示为GANE模型参数的函数,因此可以使用似然理论进行推理。我们进一步提出了game模型的“幂度”(POW-DEG)规范。通过仿真研究了POW-DEG和GANE模型的其他规格的性能。在模型不规范的情况下,POW-DEG规范表现良好。最后,我们研究了良好的POW-DEG规范实验设计的特点。我们发现图簇随机化和平衡设计对于精确估计整体治疗效果并不一定是最佳的,这表明需要替代设计策略。
{"title":"General Additive Network Effect Models","authors":"Trang Bui, Stefan H. Steiner, Nathaniel T. Stevens","doi":"10.51387/23-nejsds29","DOIUrl":"https://doi.org/10.51387/23-nejsds29","url":null,"abstract":"In the interest of business innovation, social network companies often carry out experiments to test product changes and new ideas. In such experiments, users are typically assigned to one of two experimental conditions with some outcome of interest observed and compared. In this setting, the outcome of one user may be influenced by not only the condition to which they are assigned but also the conditions of other users via their network connections. This challenges classical experimental design and analysis methodologies and requires specialized methods. We introduce the general additive network effect (GANE) model, which encompasses many existing outcome models in the literature under a unified model-based framework. The model is both interpretable and flexible in modeling the treatment effect as well as the network influence. We show that (quasi) maximum likelihood estimators are consistent and asymptotically normal for a family of model specifications. Quantities of interest such as the global treatment effect are defined and expressed as functions of the GANE model parameters, and hence inference can be carried out using likelihood theory. We further propose the “power-degree” (POW-DEG) specification of the GANE model. The performance of POW-DEG and other specifications of the GANE model are investigated via simulations. Under model misspecification, the POW-DEG specification appears to work well. Finally, we study the characteristics of good experimental designs for the POW-DEG specification. We find that graph-cluster randomization and balanced designs are not necessarily optimal for precise estimation of the global treatment effect, indicating the need for alternative design strategies.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91170366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Invited Discussion of J.O. Berger: Four Types of Frequentism and Their Interplay with Bayesianism 邀请讨论J.O. Berger:四种类型的频率主义及其与贝叶斯主义的相互作用
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds4b
L. Pericchi
One of the merits of this far reaching article is to show that not all “Frequentisms” are equal. Furthermore that there are frequentist approaches which are compelling scientifically, notably the “Empirical Frequentist” (EP), which can be paraphrased as “The proof of the pudding is in the eating”. Somewhat surprisingly to some (but anticipated in Wald’s admissibility Theorems in Decision Theory), is the conclusion that the easiest and best way to achieve the EP property is through Bayesian reasoning, perhaps more exactly, through Objective Bayesian reasoning. (I am avoiding the expression Empirical Bayesian reasoning which would be appropriate if it wasn’t associated with a very particular group of methods. It is argued below that a better name would be “Bayes Empirical”) I concentrate on Hypothesis Testing since that is the most challenging area of deeper disagreement among schools. From this substantive classification of Frequentisms, emerges the opportunity for a convergence, which is even more satisfying than a compromise, between schools. This may only be fully achieved if the prior probabilities are known, which is not usually the case. However, particularly in Hypothesis Testing, prior probabilities can and should be estimated and its uncertainty acknowledged in a Bayesian way. This may be termed perhaps, Bayes Empirical: The systematic empirical study of Prior Possibilities based on relevant data, acknowledging its uncertainty.
这篇影响深远的文章的优点之一是表明并非所有的“频率主义”都是平等的。此外,有一些频率论的方法在科学上是令人信服的,特别是“经验频率论”(EP),它可以被解释为“布丁在吃中证明”。有些令人惊讶的是(但在Wald的决策理论中的可容许性定理中已经预料到),得出的结论是,实现EP属性的最简单和最好的方法是通过贝叶斯推理,也许更准确地说,是通过客观贝叶斯推理。(我避免使用“经验贝叶斯推理”这个表达,如果它不与一组非常特殊的方法相关联,它将是合适的。下面认为,更好的名字应该是“贝叶斯实证”)我专注于假设检验,因为这是学校之间分歧最深的最具挑战性的领域。从频率的这种实质性分类中,出现了融合的机会,这比妥协更令人满意,在学校之间。这可能只有在先验概率已知的情况下才能完全实现,而通常情况并非如此。然而,特别是在假设检验中,先验概率可以而且应该以贝叶斯的方式进行估计,并承认其不确定性。这也许可以被称为贝叶斯经验:基于相关数据的先验可能性的系统实证研究,承认其不确定性。
{"title":"Invited Discussion of J.O. Berger: Four Types of Frequentism and Their Interplay with Bayesianism","authors":"L. Pericchi","doi":"10.51387/23-nejsds4b","DOIUrl":"https://doi.org/10.51387/23-nejsds4b","url":null,"abstract":"One of the merits of this far reaching article is to show that not all “Frequentisms” are equal. Furthermore that there are frequentist approaches which are compelling scientifically, notably the “Empirical Frequentist” (EP), which can be paraphrased as “The proof of the pudding is in the eating”. Somewhat surprisingly to some (but anticipated in Wald’s admissibility Theorems in Decision Theory), is the conclusion that the easiest and best way to achieve the EP property is through Bayesian reasoning, perhaps more exactly, through Objective Bayesian reasoning. (I am avoiding the expression Empirical Bayesian reasoning which would be appropriate if it wasn’t associated with a very particular group of methods. It is argued below that a better name would be “Bayes Empirical”) I concentrate on Hypothesis Testing since that is the most challenging area of deeper disagreement among schools. From this substantive classification of Frequentisms, emerges the opportunity for a convergence, which is even more satisfying than a compromise, between schools. This may only be fully achieved if the prior probabilities are known, which is not usually the case. However, particularly in Hypothesis Testing, prior probabilities can and should be estimated and its uncertainty acknowledged in a Bayesian way. This may be termed perhaps, Bayes Empirical: The systematic empirical study of Prior Possibilities based on relevant data, acknowledging its uncertainty.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86015080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
The New England Journal of Statistics in Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1