EDLT: Enabling Deep Learning for Generic Data Classification

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI:10.1109/ICDM.2018.00030

Huimei Han, Xingquan Zhu, Ying Li

{"title":"EDLT: Enabling Deep Learning for Generic Data Classification","authors":"Huimei Han, Xingquan Zhu, Ying Li","doi":"10.1109/ICDM.2018.00030","DOIUrl":null,"url":null,"abstract":"This paper proposes to enable deep learning for generic machine learning tasks. Our goal is to allow deep learning to be applied to data which are already represented in instancefeature tabular format for a better classification accuracy. Because deep learning relies on spatial/temporal correlation to learn new feature representation, our theme is to convert each instance of the original dataset into a synthetic matrix format to take the full advantage of the feature learning power of deep learning methods. To maximize the correlation of the matrix, we use 0/1 optimization to reorder features such that the ones with strong correlations are adjacent to each other. By using a two dimensional feature reordering, we are able to create a synthetic matrix, as an image, to represent each instance. Because the synthetic image preserves the original feature values and data correlation, existing deep learning algorithms, such as convolutional neural networks (CNN), can be applied to learn effective features for classification. Our experiments on 20 generic datasets, using CNN as the deep learning classifier, confirm that enabling deep learning to generic datasets has clear performance gain, compared to generic machine learning methods. In addition, the proposed method consistently outperforms simple baselines of using CNN for generic dataset. As a result, our research allows deep learning to be broadly applied to generic datasets for learning and classification (Algorithm source code is available at http://github.com/hhmzwc/EDLT).","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2018.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

This paper proposes to enable deep learning for generic machine learning tasks. Our goal is to allow deep learning to be applied to data which are already represented in instancefeature tabular format for a better classification accuracy. Because deep learning relies on spatial/temporal correlation to learn new feature representation, our theme is to convert each instance of the original dataset into a synthetic matrix format to take the full advantage of the feature learning power of deep learning methods. To maximize the correlation of the matrix, we use 0/1 optimization to reorder features such that the ones with strong correlations are adjacent to each other. By using a two dimensional feature reordering, we are able to create a synthetic matrix, as an image, to represent each instance. Because the synthetic image preserves the original feature values and data correlation, existing deep learning algorithms, such as convolutional neural networks (CNN), can be applied to learn effective features for classification. Our experiments on 20 generic datasets, using CNN as the deep learning classifier, confirm that enabling deep learning to generic datasets has clear performance gain, compared to generic machine learning methods. In addition, the proposed method consistently outperforms simple baselines of using CNN for generic dataset. As a result, our research allows deep learning to be broadly applied to generic datasets for learning and classification (Algorithm source code is available at http://github.com/hhmzwc/EDLT).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

EDLT:实现通用数据分类的深度学习

本文提出将深度学习用于通用机器学习任务。我们的目标是允许深度学习应用于已经以实例特征表格格式表示的数据，以获得更好的分类精度。由于深度学习依赖于空间/时间相关性来学习新的特征表示，我们的主题是将原始数据集的每个实例转换为合成矩阵格式，以充分利用深度学习方法的特征学习能力。为了最大化矩阵的相关性，我们使用0/1优化来重新排序特征，使具有强相关性的特征彼此相邻。通过使用二维特征重新排序，我们能够创建一个合成矩阵，作为图像，来表示每个实例。由于合成图像保留了原始特征值和数据相关性，因此可以应用卷积神经网络(CNN)等现有深度学习算法学习有效特征进行分类。我们在20个通用数据集上的实验，使用CNN作为深度学习分类器，证实了与通用机器学习方法相比，将深度学习用于通用数据集具有明显的性能增益。此外，本文提出的方法始终优于使用CNN对通用数据集的简单基线。因此，我们的研究允许深度学习广泛应用于学习和分类的通用数据集(算法源代码可在http://github.com/hhmzwc/EDLT获得)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 IEEE International Conference on Data Mining (ICDM)

自引率

0.00%

发文量

期刊最新文献

Entire Regularization Path for Sparse Nonnegative Interaction Model Accelerating Experimental Design by Incorporating Experimenter Hunches Title Page i An Efficient Many-Class Active Learning Framework for Knowledge-Rich Domains Social Recommendation with Missing Not at Random Data