{"title":"Handling Class Imbalance in High-Dimensional Biomedical Datasets","authors":"B. Pes","doi":"10.1109/WETICE.2019.00040","DOIUrl":null,"url":null,"abstract":"When dealing with biomedical data, the first and most challenging issue is often the huge dimensionality, i.e. the presence of a very high number of features for each of the problem instances at hand. A vast literature is available on different dimensionality reduction techniques that can be suitable for handling such kind of data, with a special focus on feature selection algorithms that allow to discard uninformative/useless features. In most cases, however, the dimensionality issue is addressed without a joint consideration of other potential problems in the data, including an imbalanced class distribution that may hinder the construction of effective classification models. Class imbalance, in turn, has been mostly treated in literature as an independent problem, especially in application fields where the number of features is not so critical. But several biomedical datasets are both high-dimensional and class-imbalanced, so there is a strong need for designing and evaluating learning strategies that can properly deal with both the issues simultaneously. In this work, we experiment with using feature selection techniques in conjunction with sampling-based class balancing methods and cost-sensitive classification, in order to gain insight into the most effective strategies to use when dealing with such complex data.","PeriodicalId":116875,"journal":{"name":"2019 IEEE 28th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 28th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WETICE.2019.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

When dealing with biomedical data, the first and most challenging issue is often the huge dimensionality, i.e. the presence of a very high number of features for each of the problem instances at hand. A vast literature is available on different dimensionality reduction techniques that can be suitable for handling such kind of data, with a special focus on feature selection algorithms that allow to discard uninformative/useless features. In most cases, however, the dimensionality issue is addressed without a joint consideration of other potential problems in the data, including an imbalanced class distribution that may hinder the construction of effective classification models. Class imbalance, in turn, has been mostly treated in literature as an independent problem, especially in application fields where the number of features is not so critical. But several biomedical datasets are both high-dimensional and class-imbalanced, so there is a strong need for designing and evaluating learning strategies that can properly deal with both the issues simultaneously. In this work, we experiment with using feature selection techniques in conjunction with sampling-based class balancing methods and cost-sensitive classification, in order to gain insight into the most effective strategies to use when dealing with such complex data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
处理高维生物医学数据集中的类不平衡
在处理生物医学数据时,第一个也是最具挑战性的问题通常是巨大的维度,即手头的每个问题实例都存在非常多的特征。关于不同的降维技术可以适用于处理这类数据的大量文献,特别关注允许丢弃无信息/无用特征的特征选择算法。然而,在大多数情况下,维数问题是在没有联合考虑数据中其他潜在问题的情况下解决的,包括可能阻碍有效分类模型构建的不平衡类分布。而在文学中,类失衡大多被视为一个独立的问题,尤其是在特征数量不那么重要的应用领域。但是,一些生物医学数据集既高维又类不平衡,因此迫切需要设计和评估能够同时适当处理这两个问题的学习策略。在这项工作中,我们尝试将特征选择技术与基于采样的类平衡方法和成本敏感分类相结合,以便深入了解在处理此类复杂数据时使用的最有效策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Real-Time SCADA Attack Detection by Means of Formal Methods Architecture of Anomaly Detection Module for the Security Operations Center Privacy Preserving Intrusion Detection Via Homomorphic Encryption A Deep Learning Framework to Predict Rating for Cold Start Item Using Item Metadata Mining Developer's Behavior from Web-Based IDE Logs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1