Finding Optimal Normalizing Transformations via bestNormalize

R J. Pub Date : 2021-01-01 DOI:10.32614/rj-2021-041
Ryan A. Peterson
{"title":"Finding Optimal Normalizing Transformations via bestNormalize","authors":"Ryan A. Peterson","doi":"10.32614/rj-2021-041","DOIUrl":null,"url":null,"abstract":"The bestNormalize R package was designed to help users find a transformation that can effectively normalize a vector regardless of its actual distribution. Each of the many normalization techniques that have been developed has its own strengths and weaknesses, and deciding which to use until data are fully observed is difficult or impossible. This package facilitates choosing between a range of possible transformations and will automatically return the best one, i.e., the one that makes data look the most normal. To evaluate and compare the normalization efficacy across a suite of possible transformations, we developed a statistic based on a goodness of fit test divided by its degrees of freedom. Transformations can be seamlessly trained and applied to newly observed data, and can be implemented in conjunction with caret and recipes for data preprocessing in machine learning workflows. Custom transformations and normalization statistics are supported.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"19 1","pages":"310"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"111","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"R J.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32614/rj-2021-041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 111

Abstract

The bestNormalize R package was designed to help users find a transformation that can effectively normalize a vector regardless of its actual distribution. Each of the many normalization techniques that have been developed has its own strengths and weaknesses, and deciding which to use until data are fully observed is difficult or impossible. This package facilitates choosing between a range of possible transformations and will automatically return the best one, i.e., the one that makes data look the most normal. To evaluate and compare the normalization efficacy across a suite of possible transformations, we developed a statistic based on a goodness of fit test divided by its degrees of freedom. Transformations can be seamlessly trained and applied to newly observed data, and can be implemented in conjunction with caret and recipes for data preprocessing in machine learning workflows. Custom transformations and normalization statistics are supported.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过bestNormalize找到最优的规范化转换
bestNormalize R包的设计目的是帮助用户找到一个变换,它可以有效地规范化一个向量,而不管它的实际分布如何。已经开发的许多标准化技术中的每一种都有自己的优点和缺点,在完全观察到数据之前决定使用哪一种是困难的或不可能的。这个包有助于在一系列可能的转换之间进行选择,并将自动返回最佳转换,即使数据看起来最正常的转换。为了评估和比较一组可能转换的归一化效果,我们开发了一个基于拟合优度检验除以其自由度的统计量。转换可以无缝地训练并应用于新观察到的数据,并且可以与机器学习工作流程中的数据预处理的插入符号和配方一起实现。支持自定义转换和规范化统计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Generalized Mosaic Plots in the \pkg{ggplot2} Framework populR: a Package for Population Downscaling in R Making Provenance Work for You SurvMetrics: An R package for Predictive Evaluation Metrics in Survival Analysis HostSwitch: An R Package to Simulate the Extent of Host-Switching by a Consumer
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1