Finding Optimal Normalizing Transformations via bestNormalize

R J. Pub Date : 2021-01-01 DOI:10.32614/rj-2021-041

Ryan A. Peterson

引用次数: 111

Abstract

The bestNormalize R package was designed to help users find a transformation that can effectively normalize a vector regardless of its actual distribution. Each of the many normalization techniques that have been developed has its own strengths and weaknesses, and deciding which to use until data are fully observed is difficult or impossible. This package facilitates choosing between a range of possible transformations and will automatically return the best one, i.e., the one that makes data look the most normal. To evaluate and compare the normalization efficacy across a suite of possible transformations, we developed a statistic based on a goodness of fit test divided by its degrees of freedom. Transformations can be seamlessly trained and applied to newly observed data, and can be implemented in conjunction with caret and recipes for data preprocessing in machine learning workflows. Custom transformations and normalization statistics are supported.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过bestNormalize找到最优的规范化转换

bestNormalize R包的设计目的是帮助用户找到一个变换，它可以有效地规范化一个向量，而不管它的实际分布如何。已经开发的许多标准化技术中的每一种都有自己的优点和缺点，在完全观察到数据之前决定使用哪一种是困难的或不可能的。这个包有助于在一系列可能的转换之间进行选择，并将自动返回最佳转换，即使数据看起来最正常的转换。为了评估和比较一组可能转换的归一化效果，我们开发了一个基于拟合优度检验除以其自由度的统计量。转换可以无缝地训练并应用于新观察到的数据，并且可以与机器学习工作流程中的数据预处理的插入符号和配方一起实现。支持自定义转换和规范化统计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

R J.

自引率

0.00%

发文量

期刊最新文献

Generalized Mosaic Plots in the \pkg{ggplot2} Framework populR: a Package for Population Downscaling in R Making Provenance Work for You SurvMetrics: An R package for Predictive Evaluation Metrics in Survival Analysis HostSwitch: An R Package to Simulate the Extent of Host-Switching by a Consumer