Joanna Kostanek, K. Karolczak, W. Kuliczkowski, Cezary Watała
{"title":"Bootstrap Method as a Tool for Analyzing Data with Atypical Distributions Deviating from Parametric Assumptions: Critique and Effectiveness Evaluation","authors":"Joanna Kostanek, K. Karolczak, W. Kuliczkowski, Cezary Watała","doi":"10.3390/data9080095","DOIUrl":null,"url":null,"abstract":"In today’s research environment characterized by exponential data growth and increasing complexity, the selection of appropriate statistical tests, tailored to research objectives and data distributions, is paramount for rigorous analysis and accurate interpretation. This article explores the growing prominence of bootstrapping, an advanced statistical technique for multiple comparisons analysis, offering flexibility and customization by estimating sample distributions without assuming population distributions, thus serving as a valuable alternative to traditional methods in various data scenarios. Computer simulations were conducted using data from cardiovascular disease patients. Two approaches, spontaneous partly controlled simulation and fully constrained simulation using self-written R scripts, were utilized to generate datasets with specified distributions and analyze the data using tests for comparing more than two groups. The utilization of the bootstrap method greatly improves statistical analysis, especially in overcoming the constraints of conventional parametric tests. Our research showcased its effectiveness in comparing multiple scenarios, yielding strong findings across diverse distributions, even with minor inflation in p values. Serving as a valuable substitute for parametric approaches, bootstrap promotes careful consideration when rejecting hypotheses, thus fostering a deeper understanding of statistical nuances and bolstering analytical rigor.","PeriodicalId":502371,"journal":{"name":"Data","volume":"43 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/data9080095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In today’s research environment characterized by exponential data growth and increasing complexity, the selection of appropriate statistical tests, tailored to research objectives and data distributions, is paramount for rigorous analysis and accurate interpretation. This article explores the growing prominence of bootstrapping, an advanced statistical technique for multiple comparisons analysis, offering flexibility and customization by estimating sample distributions without assuming population distributions, thus serving as a valuable alternative to traditional methods in various data scenarios. Computer simulations were conducted using data from cardiovascular disease patients. Two approaches, spontaneous partly controlled simulation and fully constrained simulation using self-written R scripts, were utilized to generate datasets with specified distributions and analyze the data using tests for comparing more than two groups. The utilization of the bootstrap method greatly improves statistical analysis, especially in overcoming the constraints of conventional parametric tests. Our research showcased its effectiveness in comparing multiple scenarios, yielding strong findings across diverse distributions, even with minor inflation in p values. Serving as a valuable substitute for parametric approaches, bootstrap promotes careful consideration when rejecting hypotheses, thus fostering a deeper understanding of statistical nuances and bolstering analytical rigor.
在当今以指数级数据增长和日益复杂为特点的研究环境中,根据研究目标和数据分布选择适当的统计检验对于严谨分析和准确解释至关重要。本文探讨了日益突出的引导法(bootstrapping),这是一种先进的多重比较分析统计技术,通过估计样本分布而不假定总体分布,提供了灵活性和定制性,因此在各种数据情况下可作为传统方法的重要替代方法。我们利用心血管疾病患者的数据进行了计算机模拟。利用自发的部分受控模拟和使用自写的 R 脚本进行完全受限模拟这两种方法,生成了具有指定分布的数据集,并使用比较两组以上的测试对数据进行了分析。引导法的使用极大地改进了统计分析,尤其是在克服传统参数检验的限制方面。我们的研究展示了自举法在比较多种情况时的有效性,即使在 p 值略有膨胀的情况下,也能在不同的分布中得出有力的结论。作为参数方法的重要替代方法,bootstrap 促进了在拒绝假设时的慎重考虑,从而加深了对统计细微差别的理解,提高了分析的严谨性。