Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models.

IF 2.6 Diagnostic and prognostic research Pub Date : 2023-04-18 DOI:10.1186/s41512-023-00145-1

Willi Sauerbrei, Edwin Kipruto, James Balmford

{"title":"Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models.","authors":"Willi Sauerbrei, Edwin Kipruto, James Balmford","doi":"10.1186/s41512-023-00145-1","DOIUrl":null,"url":null,"abstract":"Background: The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced training in statistical modeling. For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1, or FP2 functions. Influential points (IPs) and small sample sizes can both have a strong impact on a selected function and MFP model.Methods: We used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In eight subsamples, we also investigated the effects of sample size and model replicability, the latter by using three non-overlapping subsamples with the same sample size. For better illustration, a structured profile was used to provide an overview of all analyses conducted.Results: The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP was not able to detect some non-linear functions and the selected model differed substantially from the true underlying model. However, when the sample size was relatively large and regression diagnostics were carefully conducted, MFP selected functions or models that were similar to the underlying true model.Conclusions: For smaller sample size, IPs and low power are important reasons that the MFP approach may not be able to identify underlying functional relationships for continuous variables and selected models might differ substantially from the true model. However, for larger sample sizes, a carefully conducted MFP analysis is often a suitable way to select a multivariable regression model which includes continuous variables. In such a case, MFP can be the preferred approach to derive a multivariable descriptive model.","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"7 1","pages":"7"},"PeriodicalIF":2.6000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10111698/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and prognostic research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41512-023-00145-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced training in statistical modeling. For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1, or FP2 functions. Influential points (IPs) and small sample sizes can both have a strong impact on a selected function and MFP model.

Methods: We used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In eight subsamples, we also investigated the effects of sample size and model replicability, the latter by using three non-overlapping subsamples with the same sample size. For better illustration, a structured profile was used to provide an overview of all analyses conducted.

Results: The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP was not able to detect some non-linear functions and the selected model differed substantially from the true underlying model. However, when the sample size was relatively large and regression diagnostics were carefully conducted, MFP selected functions or models that were similar to the underlying true model.

Conclusions: For smaller sample size, IPs and low power are important reasons that the MFP approach may not be able to identify underlying functional relationships for continuous variables and selected models might differ substantially from the true model. However, for larger sample sizes, a carefully conducted MFP analysis is often a suitable way to select a multivariable regression model which includes continuous variables. In such a case, MFP can be the preferred approach to derive a multivariable descriptive model.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

影响点和样本量对多元分数多项式模型的选择和可复制性的影响。

背景：多变量分式多项式（MFP）方法将使用反向消除的变量选择与分式多项式（FP）函数的函数选择程序（FSP）相结合。这是一种相对简单的方法，无需经过高级统计建模培训即可轻松掌握。对于连续变量，使用封闭测试程序来决定无影响、线性、FP1 或 FP2 函数。影响点（IPs）和小样本量都会对所选函数和 MFP 模型产生很大影响：方法：我们使用包含六个连续预测因子和四个分类预测因子的模拟数据来说明有助于识别对函数选择和 MFP 模型有影响的影响点的方法。这些方法使用 "留一 "或 "留二 "以及两种相关技术进行多变量评估。在八个子样本中，我们还研究了样本量和模型可复制性的影响，后者是通过使用三个样本量相同且不重叠的子样本来实现的。为了更好地说明问题，我们使用了结构化简介来概述所进行的所有分析：结果表明，一个或多个 IP 可以驱动所选的功能和模型。此外，在样本量较小的情况下，MFP 无法检测到一些非线性函数，所选模型与真正的基本模型也有很大差异。然而，当样本量相对较大并仔细进行回归诊断时，多功能财务软件选择的函数或模型与基本真实模型相似：结论：在样本量较小的情况下，IPs 和低功率是多变量回归分析方法可能无法识别连续变量潜在函数关系的重要原因，而且所选模型可能与真实模型有很大差异。然而，对于较大的样本量，仔细进行多变量回归分析通常是选择包含连续变量的多变量回归模型的合适方法。在这种情况下，多变量回归模型可能是得出多变量描述性模型的首选方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Diagnostic and prognostic research

自引率

0.00%

发文量

审稿时长

18 weeks