首页 > 最新文献

Biostatistics and Epidemiology最新文献

英文 中文
Statistical modeling methods: challenges and strategies 统计建模方法:挑战和策略
Q3 Medicine Pub Date : 2020-01-01 DOI: 10.1080/24709360.2019.1618653
Steven S. Henley, R. Golden, T. Kashner
ABSTRACT Statistical modeling methods are widely used in clinical science, epidemiology, and health services research to analyze data that has been collected in clinical trials as well as observational studies of existing data sources, such as claims files and electronic health records. Diagnostic and prognostic inferences from statistical models are critical to researchers advancing science, clinical practitioners making patient care decisions, and administrators and policy makers impacting the health care system to improve quality and reduce costs. The veracity of such inferences relies not only on the quality and completeness of the collected data, but also statistical model validity. A key component of establishing model validity is determining when a model is not correctly specified and therefore incapable of adequately representing the Data Generating Process (DGP). In this article, model validity is first described and methods designed for assessing model fit, specification, and selection are reviewed. Second, data transformations that improve the model’s ability to represent the DGP are addressed. Third, model search and validation methods are discussed. Finally, methods for evaluating predictive and classification performance are presented. Together, these methods provide a practical framework with recommendations to guide the development and evaluation of statistical models that provide valid statistical inferences.
摘要统计建模方法广泛用于临床科学、流行病学和卫生服务研究,用于分析临床试验以及现有数据源(如索赔文件和电子健康记录)的观察性研究中收集的数据。统计模型的诊断和预后推断对于研究人员推进科学、临床从业者做出患者护理决策以及管理人员和政策制定者影响医疗保健系统以提高质量和降低成本至关重要。这种推断的准确性不仅取决于所收集数据的质量和完整性,还取决于统计模型的有效性。建立模型有效性的一个关键组成部分是确定模型何时没有正确指定,因此无法充分表示数据生成过程(DGP)。本文首先描述了模型的有效性,并回顾了评估模型拟合、规范和选择的方法。其次,讨论了提高模型表示DGP能力的数据转换。第三,讨论了模型搜索和验证方法。最后,提出了评估预测和分类性能的方法。这些方法共同提供了一个实用的框架和建议,以指导提供有效统计推断的统计模型的开发和评估。
{"title":"Statistical modeling methods: challenges and strategies","authors":"Steven S. Henley, R. Golden, T. Kashner","doi":"10.1080/24709360.2019.1618653","DOIUrl":"https://doi.org/10.1080/24709360.2019.1618653","url":null,"abstract":"ABSTRACT Statistical modeling methods are widely used in clinical science, epidemiology, and health services research to analyze data that has been collected in clinical trials as well as observational studies of existing data sources, such as claims files and electronic health records. Diagnostic and prognostic inferences from statistical models are critical to researchers advancing science, clinical practitioners making patient care decisions, and administrators and policy makers impacting the health care system to improve quality and reduce costs. The veracity of such inferences relies not only on the quality and completeness of the collected data, but also statistical model validity. A key component of establishing model validity is determining when a model is not correctly specified and therefore incapable of adequately representing the Data Generating Process (DGP). In this article, model validity is first described and methods designed for assessing model fit, specification, and selection are reviewed. Second, data transformations that improve the model’s ability to represent the DGP are addressed. Third, model search and validation methods are discussed. Finally, methods for evaluating predictive and classification performance are presented. Together, these methods provide a practical framework with recommendations to guide the development and evaluation of statistical models that provide valid statistical inferences.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"105 - 139"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1618653","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47377251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Developments and debates on latent variable modeling in diagnostic studies when there is no gold standard 在没有金标准的情况下,诊断研究中潜在变量模型的发展和争论
Q3 Medicine Pub Date : 2019-10-15 DOI: 10.1080/24709360.2019.1673623
Zheyu Wang
Latent variable modeling is often used in diagnostic studies where a gold standard reference test is not available. Its applications have become increasing popular with the fast discovery of novel biomarkers and the effort to improve healthcare for each individual. This paper attempt to provide a review on current developments and debates of these models with a focus in diagnostic studies and to discuss the value as well as cautionary considerations in the applications of these models.
潜变量模型常用于诊断研究,其中金标准参考测试是不可用的。随着新的生物标记物的快速发现和改善每个人的医疗保健的努力,它的应用越来越受欢迎。本文试图以诊断研究为重点,对这些模型的当前发展和争论进行回顾,并讨论这些模型在应用中的价值和注意事项。
{"title":"Developments and debates on latent variable modeling in diagnostic studies when there is no gold standard","authors":"Zheyu Wang","doi":"10.1080/24709360.2019.1673623","DOIUrl":"https://doi.org/10.1080/24709360.2019.1673623","url":null,"abstract":"Latent variable modeling is often used in diagnostic studies where a gold standard reference test is not available. Its applications have become increasing popular with the fast discovery of novel biomarkers and the effort to improve healthcare for each individual. This paper attempt to provide a review on current developments and debates of these models with a focus in diagnostic studies and to discuss the value as well as cautionary considerations in the applications of these models.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"5 1","pages":"100 - 117"},"PeriodicalIF":0.0,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1673623","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45189867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How many clusters exist? Answer via maximum clustering similarity implemented in R 有多少集群存在?通过在R中实现的最大聚类相似性来回答
Q3 Medicine Pub Date : 2019-01-01 DOI: 10.1080/24709360.2019.1615770
A. Albatineh, M. Wilcox, B. Zogheib, M. Niewiadomska-Bugaj
Finding the number of clusters in a data set is considered as one of the fundamental problems in cluster analysis. This paper integrates maximum clustering similarity (MCS), for finding the optimal number of clusters, into R statistical software through the package MCSim. The similarity between the two clustering methods is calculated at the same number of clusters, using Rand [Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–850.] and Jaccard [The distribution of the flora of the alpine zone. New Phytologist. 1912;11:37–50.] indices, corrected for chance agreement. The number of clusters at which the index attains its maximum with most frequency is a candidate for the optimal number of clusters. Unlike other criteria, MCS can be used with circular data. Seven clustering algorithms, existing in R, are implemented in MCSim. A graph of the number of clusters vs. clusters similarity using corrected similarity indices is produced. Values of the similarity indices and a clustering tree (dendrogram) are produced. Several examples including simulated, real, and circular data sets are presented to show how MCSim successfully works in practice.
找出数据集中的聚类数量被认为是聚类分析的基本问题之一。本文通过MCSim软件包将最大聚类相似性(MCS)集成到R统计软件中,以寻找最优聚类数。两种聚类方法之间的相似性是在相同数量的聚类下计算的,使用Rand[聚类方法评估的客观标准。J Am Stat Assoc.1971;66:846–850.]和Jaccard[高山区植物群的分布。新植物学家。1912;11:37–50.]指数,对偶然一致性进行校正。指数以最高频率达到最大值的聚类数量是最优聚类数量的候选者。与其他标准不同,MCS可用于循环数据。在MCSim中实现了R中存在的七种聚类算法。使用校正的相似性指数生成聚类数量与聚类相似性的关系图。生成相似性指数的值和聚类树(树状图)。给出了几个例子,包括模拟、真实和循环数据集,以展示MCSim是如何在实践中成功工作的。
{"title":"How many clusters exist? Answer via maximum clustering similarity implemented in R","authors":"A. Albatineh, M. Wilcox, B. Zogheib, M. Niewiadomska-Bugaj","doi":"10.1080/24709360.2019.1615770","DOIUrl":"https://doi.org/10.1080/24709360.2019.1615770","url":null,"abstract":"Finding the number of clusters in a data set is considered as one of the fundamental problems in cluster analysis. This paper integrates maximum clustering similarity (MCS), for finding the optimal number of clusters, into R statistical software through the package MCSim. The similarity between the two clustering methods is calculated at the same number of clusters, using Rand [Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–850.] and Jaccard [The distribution of the flora of the alpine zone. New Phytologist. 1912;11:37–50.] indices, corrected for chance agreement. The number of clusters at which the index attains its maximum with most frequency is a candidate for the optimal number of clusters. Unlike other criteria, MCS can be used with circular data. Seven clustering algorithms, existing in R, are implemented in MCSim. A graph of the number of clusters vs. clusters similarity using corrected similarity indices is produced. Values of the similarity indices and a clustering tree (dendrogram) are produced. Several examples including simulated, real, and circular data sets are presented to show how MCSim successfully works in practice.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"62 - 79"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1615770","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42954294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cohort study design for illness-death processes with disease status under intermittent observation 间歇观察下疾病状态下疾病死亡过程的队列研究设计
Q3 Medicine Pub Date : 2019-01-01 DOI: 10.1080/24709360.2019.1699341
Nathalie C. Moon, Leilei Zeng, R. Cook
Cohort studies are routinely conducted to learn about the incidence or progression rates of chronic diseases. The illness-death model offers a natural framework for joint consideration of non-fatal events in the semi-competing risks setting. We consider the design of prospective cohort studies where the goal is to estimate the effect of a marker on the risk of a non-fatal event which is subject to interval-censoring due to an intermittent observation scheme. The sample size is shown to depend on the effect of interest, the number of assessments, and the duration of follow-up. Minimum-cost designs are also developed to account for the different costs of recruitment and follow-up examination. We also consider the setting where the event status of individuals is observed subject to misclassification; the consequent need to increase the sample size to account for this error is illustrated through asymptotic calculations.
定期进行队列研究以了解慢性病的发病率或进展率。疾病-死亡模型为在半竞争风险环境下联合考虑非致命事件提供了一个自然的框架。我们考虑了前瞻性队列研究的设计,其目标是估计标记物对非致命事件风险的影响,该事件由于间歇性观察方案而受到间隔审查。样本量取决于兴趣的影响、评估的次数和随访的持续时间。最低成本的设计也考虑到招聘和后续审查的不同费用。我们还考虑了个体的事件状态被观察到可能被错误分类的情况;因此需要增加样本量来解释这个误差是通过渐近计算来说明的。
{"title":"Cohort study design for illness-death processes with disease status under intermittent observation","authors":"Nathalie C. Moon, Leilei Zeng, R. Cook","doi":"10.1080/24709360.2019.1699341","DOIUrl":"https://doi.org/10.1080/24709360.2019.1699341","url":null,"abstract":"Cohort studies are routinely conducted to learn about the incidence or progression rates of chronic diseases. The illness-death model offers a natural framework for joint consideration of non-fatal events in the semi-competing risks setting. We consider the design of prospective cohort studies where the goal is to estimate the effect of a marker on the risk of a non-fatal event which is subject to interval-censoring due to an intermittent observation scheme. The sample size is shown to depend on the effect of interest, the number of assessments, and the duration of follow-up. Minimum-cost designs are also developed to account for the different costs of recruitment and follow-up examination. We also consider the setting where the event status of individuals is observed subject to misclassification; the consequent need to increase the sample size to account for this error is illustrated through asymptotic calculations.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"178 - 200"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1699341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47561742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modified sparse functional principal component analysis for fMRI data process 改进的稀疏功能主成分分析在fMRI数据处理中的应用
Q3 Medicine Pub Date : 2019-01-01 DOI: 10.1080/24709360.2019.1591072
Zhengyang Fang, J. Y. Han, N. Simon, Xiaoping Zhou
Sparse and functional principal component analysis is a technique to extract sparse and smooth principal components from a matrix. In this paper, we propose a modified sparse and functional principal component analysis model for feature extraction. We measure the tuning parameters by their robustness against random perturbation, and select the tuning parameters by derivative-free optimization. We test our algorithm on the ADNI dataset to distinguish between the patients with Alzheimer's disease and the control group. By applying proper classification methods for sparse features, we get better result than classic singular value decomposition, support vector machine and logistic regression.
稀疏泛函主成分分析是一种从矩阵中提取稀疏光滑主成分的技术。本文提出了一种改进的稀疏功能主成分分析模型用于特征提取。我们通过对随机扰动的鲁棒性来衡量调谐参数,并通过无导数优化来选择调谐参数。我们在ADNI数据集上测试了我们的算法,以区分阿尔茨海默病患者和对照组。通过对稀疏特征采用合适的分类方法,得到了比经典的奇异值分解、支持向量机和逻辑回归更好的分类结果。
{"title":"Modified sparse functional principal component analysis for fMRI data process","authors":"Zhengyang Fang, J. Y. Han, N. Simon, Xiaoping Zhou","doi":"10.1080/24709360.2019.1591072","DOIUrl":"https://doi.org/10.1080/24709360.2019.1591072","url":null,"abstract":"Sparse and functional principal component analysis is a technique to extract sparse and smooth principal components from a matrix. In this paper, we propose a modified sparse and functional principal component analysis model for feature extraction. We measure the tuning parameters by their robustness against random perturbation, and select the tuning parameters by derivative-free optimization. We test our algorithm on the ADNI dataset to distinguish between the patients with Alzheimer's disease and the control group. By applying proper classification methods for sparse features, we get better result than classic singular value decomposition, support vector machine and logistic regression.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"80 - 89"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1591072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46473035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A response adaptive design for ordinal categorical responses weighing the cumulative odds ratios 衡量累积优势比的有序分类反应的反应自适应设计
Q3 Medicine Pub Date : 2019-01-01 DOI: 10.1080/24709360.2019.1660111
A. Biswas, Rahul Bhattacharya, Soumyadeep Das
ABSTRACT Weighing the cumulative odds ratios suitably, a two treatment response adaptive design for phase III clinical trial is proposed for ordinal categorical responses. Properties of the proposed design are investigated theoretically as well as empirically. Applicability of the design is further verified using a data pertaining to a real clinical trial with trauma patients, where the responses are observed in an ordinal categorical scale.
摘要适当权衡累积优势比,提出了一种用于III期临床试验的两种治疗反应自适应设计,用于顺序分类反应。对所提出的设计的性质进行了理论和实证研究。使用与创伤患者的真实临床试验相关的数据进一步验证了该设计的适用性,其中反应是在有序分类量表中观察到的。
{"title":"A response adaptive design for ordinal categorical responses weighing the cumulative odds ratios","authors":"A. Biswas, Rahul Bhattacharya, Soumyadeep Das","doi":"10.1080/24709360.2019.1660111","DOIUrl":"https://doi.org/10.1080/24709360.2019.1660111","url":null,"abstract":"ABSTRACT Weighing the cumulative odds ratios suitably, a two treatment response adaptive design for phase III clinical trial is proposed for ordinal categorical responses. Properties of the proposed design are investigated theoretically as well as empirically. Applicability of the design is further verified using a data pertaining to a real clinical trial with trauma patients, where the responses are observed in an ordinal categorical scale.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"109 - 125"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1660111","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47455932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Essential concepts of causal inference: a remarkable history and an intriguing future 因果推理的基本概念:非凡的历史和迷人的未来
Q3 Medicine Pub Date : 2019-01-01 DOI: 10.1080/24709360.2019.1670513
D. Rubin
ABSTRACT Causal inference refers to the process of inferring what would happen in the future if we change what we are doing, or inferring what would have happened in the past, if we had done something different in the distant past. Humans adjust our behaviors by anticipating what will happen if we act in different ways, using past experiences to inform these choices. ‘Essential’ here means in the mathematical sense of excluding the unnecessary and including only the necessary, e.g. stating that the Pythagorean theorem works for an isosceles right triangle is bad mathematics because it includes the unnecessary adjective isosceles; of course this is not as bad as omitting the adjective ‘right.’ I find much of what is written about causal inference to be mathematically inapposite in one of these senses because the descriptions either include irrelevant clutter or omit conditions required for the correctness of the assertions. The history of formal causal inference is remarkable because its correct formulation is so recent, a twentieth century phenomenon, and its future is intriguing because it is currently undeveloped when applied to investigate interventions applied to conscious humans, and moreover will utilize tools impossible without modern computing.
摘要因果推断是指推断如果我们改变正在做的事情,未来会发生什么的过程,或者推断如果我们在遥远的过去做了一些不同的事情,过去会发生什么。人类通过预测如果我们以不同的方式行事会发生什么来调整我们的行为,并利用过去的经验来为这些选择提供信息。”Essential在这里的意思是在数学意义上排除不必要的,只包括必要的,例如说勾股定理适用于等腰直角三角形是糟糕的数学,因为它包括不必要的形容词等腰;当然,这并没有省略形容词“对”那么糟糕我发现,在其中一种意义上,关于因果推理的大部分内容在数学上都是不令人信服的,因为描述要么包括不相关的混乱,要么省略了断言正确性所需的条件。形式因果推理的历史之所以引人注目,是因为它的正确表述是最近才出现的,是二十世纪的一种现象,而它的未来之所以有趣,是因为当它被应用于研究应用于有意识的人类的干预措施时,它目前还没有开发出来,而且它将使用没有现代计算就不可能使用的工具。
{"title":"Essential concepts of causal inference: a remarkable history and an intriguing future","authors":"D. Rubin","doi":"10.1080/24709360.2019.1670513","DOIUrl":"https://doi.org/10.1080/24709360.2019.1670513","url":null,"abstract":"ABSTRACT Causal inference refers to the process of inferring what would happen in the future if we change what we are doing, or inferring what would have happened in the past, if we had done something different in the distant past. Humans adjust our behaviors by anticipating what will happen if we act in different ways, using past experiences to inform these choices. ‘Essential’ here means in the mathematical sense of excluding the unnecessary and including only the necessary, e.g. stating that the Pythagorean theorem works for an isosceles right triangle is bad mathematics because it includes the unnecessary adjective isosceles; of course this is not as bad as omitting the adjective ‘right.’ I find much of what is written about causal inference to be mathematically inapposite in one of these senses because the descriptions either include irrelevant clutter or omit conditions required for the correctness of the assertions. The history of formal causal inference is remarkable because its correct formulation is so recent, a twentieth century phenomenon, and its future is intriguing because it is currently undeveloped when applied to investigate interventions applied to conscious humans, and moreover will utilize tools impossible without modern computing.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"140 - 155"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1670513","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43617355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Variable selection and nonlinear effect discovery in partially linear mixture cure rate models 部分线性混合固化率模型的变量选择与非线性效应发现
Q3 Medicine Pub Date : 2019-01-01 DOI: 10.1080/24709360.2019.1663665
A. Masud, Zhangsheng Yu, W. Tu
Survival data with long-term survivors are common in clinical investigations. Such data are often analyzed with mixture cure rate models. Existing model selection procedures do not readily discriminate nonlinear effects from linear ones. Here, we propose a procedure for accommodating nonlinear effects and for determining the cure rate model composition. The procedure is based on the Least Absolute Shrinkage and Selection Operators (LASSO). Specifically, by partitioning each variable into linear and nonlinear components, we use LASSO to select linear and nonlinear components. Operationally, we model the nonlinear components by cubic B-splines. The procedure adds to the existing variable selection methods an ability to discover hidden nonlinear effects in a cure rate model setting. To implement, we ascertain the maximum likelihood estimates by using an Expectation Maximization (EM) algorithm. We conduct an extensive simulation study to assess the operating characteristics of the selection procedure. We illustrate the use of the method by analyzing data from a real clinical study.
长期幸存者的生存数据在临床调查中很常见。此类数据通常使用混合物固化率模型进行分析。现有的模型选择程序不容易区分非线性效应和线性效应。在这里,我们提出了一种适应非线性效应和确定固化率模型组成的程序。该过程基于最小绝对收缩和选择算子(LASSO)。具体来说,通过将每个变量划分为线性和非线性分量,我们使用LASSO来选择线性和非线性组件。在操作上,我们使用三次B样条对非线性分量进行建模。该程序为现有的变量选择方法增加了发现治愈率模型设置中隐藏的非线性影响的能力。为了实现,我们通过使用期望最大化(EM)算法来确定最大似然估计。我们进行了广泛的模拟研究,以评估选择程序的操作特征。我们通过分析真实临床研究的数据来说明该方法的使用。
{"title":"Variable selection and nonlinear effect discovery in partially linear mixture cure rate models","authors":"A. Masud, Zhangsheng Yu, W. Tu","doi":"10.1080/24709360.2019.1663665","DOIUrl":"https://doi.org/10.1080/24709360.2019.1663665","url":null,"abstract":"Survival data with long-term survivors are common in clinical investigations. Such data are often analyzed with mixture cure rate models. Existing model selection procedures do not readily discriminate nonlinear effects from linear ones. Here, we propose a procedure for accommodating nonlinear effects and for determining the cure rate model composition. The procedure is based on the Least Absolute Shrinkage and Selection Operators (LASSO). Specifically, by partitioning each variable into linear and nonlinear components, we use LASSO to select linear and nonlinear components. Operationally, we model the nonlinear components by cubic B-splines. The procedure adds to the existing variable selection methods an ability to discover hidden nonlinear effects in a cure rate model setting. To implement, we ascertain the maximum likelihood estimates by using an Expectation Maximization (EM) algorithm. We conduct an extensive simulation study to assess the operating characteristics of the selection procedure. We illustrate the use of the method by analyzing data from a real clinical study.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"156 - 177"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1663665","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47487854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Modeling exposures with a spike at zero: simulation study and practical application to survival data 零峰值暴露的建模:生存数据的模拟研究和实际应用
Q3 Medicine Pub Date : 2019-01-01 DOI: 10.1080/24709360.2019.1580463
E. Lorenz, C. Jenkner, W. Sauerbrei, H. Becher
Risk and prognostic factors in epidemiological and clinical research are often semicontinuous such that a proportion of individuals have exposure zero, and a continuous distribution among those exposed. We call this a spike at zero (SAZ). Typical examples are consumption of alcohol and tobacco, or hormone receptor levels. To additionally model non-linear functional relationships for SAZ variables, an extension of the fractional polynomial (FP) approach was proposed. To indicate whether or not a value is zero, a binary variable is added to the model. In a two-stage procedure, called FP-spike, it is assessed whether the binary variable and/or the continuous FP function for the positive part is required for a suitable fit. In this paper, we compared the performance of two approaches – standard FP and FP-spike – in the Cox model in a motivating example on breast cancer prognosis and a simulation study. The comparisons lead to the suggestion to generally using FP-spike rather than standard FP when the SAZ effect is considerably large because the method performed better in real data applications and simulation in terms of deviance and functional form. Abbreviations: CI: confidence interval; FP: fractional polynomial; FP1: first degree fractional polynomial; FP2: second degree fractional polynomial; FSP: function selection procedure; HT: hormone therapy; OR: odds ratio; SAZ: spike at zero
流行病学和临床研究中的风险和预后因素通常是半连续的,因此一定比例的个体暴露为零,而暴露者之间的分布是连续的。我们称之为零峰值(SAZ)。典型的例子是酒精和烟草的消耗,或激素受体水平。为了进一步建立SAZ变量的非线性函数关系模型,提出了分数阶多项式(FP)方法的扩展。为了指示一个值是否为零,将一个二进制变量添加到模型中。在称为FP-spike的两阶段程序中,评估二元变量和/或正部分的连续FP函数是否需要合适的拟合。在本文中,我们比较了标准FP和FP-spike两种方法在Cox模型中对乳腺癌预后的激励实例和模拟研究中的性能。比较得出的建议是,当SAZ效应相当大时,通常使用FP-spike而不是标准FP,因为该方法在实际数据应用和模拟中的偏差和功能形式方面表现更好。缩写:CI:置信区间;FP:分数阶多项式;FP1:一次分数阶多项式;FP2:二阶分数多项式;FSP:功能选择程序;HT:激素治疗;OR:优势比;SAZ:峰值为零
{"title":"Modeling exposures with a spike at zero: simulation study and practical application to survival data","authors":"E. Lorenz, C. Jenkner, W. Sauerbrei, H. Becher","doi":"10.1080/24709360.2019.1580463","DOIUrl":"https://doi.org/10.1080/24709360.2019.1580463","url":null,"abstract":"Risk and prognostic factors in epidemiological and clinical research are often semicontinuous such that a proportion of individuals have exposure zero, and a continuous distribution among those exposed. We call this a spike at zero (SAZ). Typical examples are consumption of alcohol and tobacco, or hormone receptor levels. To additionally model non-linear functional relationships for SAZ variables, an extension of the fractional polynomial (FP) approach was proposed. To indicate whether or not a value is zero, a binary variable is added to the model. In a two-stage procedure, called FP-spike, it is assessed whether the binary variable and/or the continuous FP function for the positive part is required for a suitable fit. In this paper, we compared the performance of two approaches – standard FP and FP-spike – in the Cox model in a motivating example on breast cancer prognosis and a simulation study. The comparisons lead to the suggestion to generally using FP-spike rather than standard FP when the SAZ effect is considerably large because the method performed better in real data applications and simulation in terms of deviance and functional form. Abbreviations: CI: confidence interval; FP: fractional polynomial; FP1: first degree fractional polynomial; FP2: second degree fractional polynomial; FSP: function selection procedure; HT: hormone therapy; OR: odds ratio; SAZ: spike at zero","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"23 - 37"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1580463","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48298967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A frequentist mixture modeling of stop signal reaction times 停止信号反应时间的频率混合建模
Q3 Medicine Pub Date : 2019-01-01 DOI: 10.1080/24709360.2019.1660110
M. Soltanifar, A. Dupuis, R. Schachar, M. Escobar
The stop signal reaction time (SSRT), a measure of the latency of the stop signal process, has been theoretically formulated using a horse race model of go and stop signal processes by the American scientist Gordon Logan (1994). The SSRT assumes equal impact of the preceding trial type (go/stop) on its measurement. In the case of a violation of this assumption, we consider estimation of SSRT based on the idea of earlier analysis of cluster type go reaction times (GORT) and linear mixed model (LMM) data analysis results. Two clusters of trials were considered including those trials preceded by a go trial and other trials preceded by a stop trial. Given disparities between cluster type SSRTs, we need to consider some new indexes considering the unused cluster type information in the calculations. We introduce mixture SSRT and weighted SSRT as two new distinct indexes of SSRT that address the violated assumption. Mixture SSRT and weighted SSRT are theoretically asymptotically equivalent under special conditions. An example of stop single task (SST) real data is presented to show equivalency of these two new SSRT indexes and their larger magnitude compared to Logan's single 1994 SSRT. Abbreviations: ADHD: attention deficit hyperactivity disorder; ExG: Ex-Gaussiandistribution; GORT: reaction time in a go trial; GORTA: reaction time in a type A gotrial; GORTB: reaction time in a type B go trial; LMM: linear mixed model; SWAN:strengths and weakness of ADHD symptoms and normal behavior rating scale; SSD: stop signal delay; SR: signal respond; SRRT: reaction time in a failedstop trial; SSRT: stop signal reaction times in a stop trial; SST: stop signaltask.
停止信号反应时间(SSRT)是一种测量停止信号过程潜伏期的方法,由美国科学家戈登·洛根(Gordon Logan, 1994)利用赛马的围棋和停止信号过程模型从理论上提出。SSRT假设前面的试验类型(go/stop)对其测量的影响相等。在违反这一假设的情况下,我们考虑基于先前分析聚类反应时间(GORT)和线性混合模型(LMM)数据分析结果的思想来估计SSRT。考虑了两组试验,包括在开始试验之前进行的试验和在停止试验之前进行的其他试验。考虑到集群类型SSRTs之间的差异,我们需要考虑计算中未使用的集群类型信息的一些新索引。我们引入混合SSRT和加权SSRT作为SSRT的两个新的不同指标来解决违反假设。混合SSRT和加权SSRT在特殊条件下理论上是渐近等价的。以停止单任务(SST)的实际数据为例,展示了这两个新的SSRT指数的等效性,以及与Logan的1994年单SSRT相比,它们的量级更大。缩写:ADHD:注意缺陷多动障碍;ExG: Ex-Gaussiandistribution;GORT:围棋试验中的反应时间;GORTA: a型试验的反应时间;GORTB: B型go试验的反应时间;LMM:线性混合模型;SWAN: ADHD症状优缺点与正常行为评定量表;SSD:停止信号延时;SR:信号响应;SRRT:失败停止试验的反应时间;SSRT:停止试验中停止信号反应时间;停止signaltask。
{"title":"A frequentist mixture modeling of stop signal reaction times","authors":"M. Soltanifar, A. Dupuis, R. Schachar, M. Escobar","doi":"10.1080/24709360.2019.1660110","DOIUrl":"https://doi.org/10.1080/24709360.2019.1660110","url":null,"abstract":"The stop signal reaction time (SSRT), a measure of the latency of the stop signal process, has been theoretically formulated using a horse race model of go and stop signal processes by the American scientist Gordon Logan (1994). The SSRT assumes equal impact of the preceding trial type (go/stop) on its measurement. In the case of a violation of this assumption, we consider estimation of SSRT based on the idea of earlier analysis of cluster type go reaction times (GORT) and linear mixed model (LMM) data analysis results. Two clusters of trials were considered including those trials preceded by a go trial and other trials preceded by a stop trial. Given disparities between cluster type SSRTs, we need to consider some new indexes considering the unused cluster type information in the calculations. We introduce mixture SSRT and weighted SSRT as two new distinct indexes of SSRT that address the violated assumption. Mixture SSRT and weighted SSRT are theoretically asymptotically equivalent under special conditions. An example of stop single task (SST) real data is presented to show equivalency of these two new SSRT indexes and their larger magnitude compared to Logan's single 1994 SSRT. Abbreviations: ADHD: attention deficit hyperactivity disorder; ExG: Ex-Gaussiandistribution; GORT: reaction time in a go trial; GORTA: reaction time in a type A gotrial; GORTB: reaction time in a type B go trial; LMM: linear mixed model; SWAN:strengths and weakness of ADHD symptoms and normal behavior rating scale; SSD: stop signal delay; SR: signal respond; SRRT: reaction time in a failedstop trial; SSRT: stop signal reaction times in a stop trial; SST: stop signaltask.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"108 - 90"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1660110","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41851070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Biostatistics and Epidemiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1