OTrecod: An R Package for Data Fusion using Optimal Transportation Theory

R J. Pub Date : 2023-02-10 DOI:10.32614/rj-2023-006
G. Guernec, Valérie Garès, J. Omer, Philippe Saint-Pierre, N. Savy
{"title":"OTrecod: An R Package for Data Fusion using Optimal Transportation Theory","authors":"G. Guernec, Valérie Garès, J. Omer, Philippe Saint-Pierre, N. Savy","doi":"10.32614/rj-2023-006","DOIUrl":null,"url":null,"abstract":"The advances of information technologies often confront users with a large amount of data which is essential to integrate easily. In this context, creating a single database from multiple separate data sources can appear as an attractive but complex issue when same information of interest is stored in at least two distinct encodings. In this situation, merging the data sources consists in finding a common recoding scale to fill the incomplete information in a synthetic database. The OTrecod package provides R-users two functions dedicated to solve this recoding problem using optimal transportation theory. Specific arguments of these functions enrich the algorithms by relaxing distributional constraints or adding a regularization term to make the data fusion more flexible. The OTrecod package also provides a set of support functions dedicated to the harmonization of separate data sources, the handling of incomplete information and the selection of matching variables. This paper gives all the keys to quickly understand and master the original algorithms implemented in the OTrecod package, assisting step by step the user in its data fusion project.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"43 1","pages":"195-222"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"R J.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32614/rj-2023-006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The advances of information technologies often confront users with a large amount of data which is essential to integrate easily. In this context, creating a single database from multiple separate data sources can appear as an attractive but complex issue when same information of interest is stored in at least two distinct encodings. In this situation, merging the data sources consists in finding a common recoding scale to fill the incomplete information in a synthetic database. The OTrecod package provides R-users two functions dedicated to solve this recoding problem using optimal transportation theory. Specific arguments of these functions enrich the algorithms by relaxing distributional constraints or adding a regularization term to make the data fusion more flexible. The OTrecod package also provides a set of support functions dedicated to the harmonization of separate data sources, the handling of incomplete information and the selection of matching variables. This paper gives all the keys to quickly understand and master the original algorithms implemented in the OTrecod package, assisting step by step the user in its data fusion project.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于最优传输理论的数据融合R包
随着信息技术的发展,用户经常需要面对大量的数据,而这些数据对于易于集成至关重要。在这种情况下,当感兴趣的相同信息以至少两种不同的编码存储时,从多个独立的数据源创建单个数据库似乎是一个吸引人但复杂的问题。在这种情况下,合并数据源包括找到一个通用的重新编码尺度来填充合成数据库中的不完整信息。OTrecod包为r用户提供了两个函数,专门用于使用最优传输理论解决这个重新编码问题。这些函数的具体参数通过放宽分布约束或添加正则化项来丰富算法,使数据融合更加灵活。OTrecod包还提供了一组专门用于协调独立数据源、处理不完整信息和选择匹配变量的支持函数。本文给出了快速理解和掌握OTrecod包中实现的原始算法的所有关键,帮助用户逐步完成其数据融合项目。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Generalized Mosaic Plots in the \pkg{ggplot2} Framework populR: a Package for Population Downscaling in R Making Provenance Work for You SurvMetrics: An R package for Predictive Evaluation Metrics in Survival Analysis HostSwitch: An R Package to Simulate the Extent of Host-Switching by a Consumer
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1