PALLADIO: A Parallel Framework for Robust Variable Selection in High-Dimensional Data

Matteo Barbieri, Samuele Fiorini, Federico Tomasi, A. Barla
{"title":"PALLADIO: A Parallel Framework for Robust Variable Selection in High-Dimensional Data","authors":"Matteo Barbieri, Samuele Fiorini, Federico Tomasi, A. Barla","doi":"10.1109/PYHPC.2016.13","DOIUrl":null,"url":null,"abstract":"The main goal of supervised data analytics is to model a target phenomenon given a limited amount of samples, each represented by an arbitrarily large number of variables. Especially when the number of variables is much larger than the number of available samples, variable selection is a key step as it allows to identify a possibly reduced subset of relevant variables describing the observed phenomenon. Obtaining interpretable and reliable results, in this highly indeterminate scenario, is often a non-trivial task. In this work we present PALLADIO, a framework designed for HPC cluster architectures, that is able to provide robust variable selection in high-dimensional problems. PALLADIO is developed in Python and it integrates CUDA kernels to decrease the computational time needed for several independent element-wise operations. The scalability of the proposed framework is assessed on synthetic data of different sizes, which represent realistic scenarios.","PeriodicalId":178771,"journal":{"name":"2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PYHPC.2016.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

The main goal of supervised data analytics is to model a target phenomenon given a limited amount of samples, each represented by an arbitrarily large number of variables. Especially when the number of variables is much larger than the number of available samples, variable selection is a key step as it allows to identify a possibly reduced subset of relevant variables describing the observed phenomenon. Obtaining interpretable and reliable results, in this highly indeterminate scenario, is often a non-trivial task. In this work we present PALLADIO, a framework designed for HPC cluster architectures, that is able to provide robust variable selection in high-dimensional problems. PALLADIO is developed in Python and it integrates CUDA kernels to decrease the computational time needed for several independent element-wise operations. The scalability of the proposed framework is assessed on synthetic data of different sizes, which represent realistic scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高维数据鲁棒变量选择的并行框架
监督数据分析的主要目标是对给定有限数量样本的目标现象进行建模,每个样本由任意大量的变量表示。特别是当变量的数量远远大于可用样本的数量时,变量选择是一个关键步骤,因为它允许识别描述观察到的现象的相关变量的可能减少的子集。在这种高度不确定的情况下,获得可解释和可靠的结果通常是一项非常重要的任务。在这项工作中,我们提出了PALLADIO,一个为高性能计算集群架构设计的框架,它能够在高维问题中提供鲁棒的变量选择。PALLADIO是用Python开发的,它集成了CUDA内核,以减少几个独立元素操作所需的计算时间。在不同规模的合成数据上评估了该框架的可扩展性,这些数据代表了现实场景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Migrating Legacy Fortran to Python While Retaining Fortran-Level Performance through Transpilation and Type Hints Boosting Python Performance on Intel Processors: A Case Study of Optimizing Music Recognition PALLADIO: A Parallel Framework for Robust Variable Selection in High-Dimensional Data Dynamic Provisioning and Execution of HPC Workflows Using Python Mrs: High Performance MapReduce for Iterative and Asynchronous Algorithms in Python
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1