同时特征选择和元组选择的高效分类

M. Dash, V. Gopalkrishnan
{"title":"同时特征选择和元组选择的高效分类","authors":"M. Dash, V. Gopalkrishnan","doi":"10.4018/978-1-60566-748-5.CH012","DOIUrl":null,"url":null,"abstract":"It is no longer news that data are increasing very rapidly day-by-day. Particularly with Internet becoming so prevalent everywhere, the sources of data have become numerous. Data are increasing in both ways: dimensions or features and instances or examples or tuples, not all the data are relevant though. While gathering the data on any particular aspect, usually one tends to gather as much information as will be required for various tasks. One may not explicitly have any particular task, for example classification, in mind. So, it behooves for a data mining expert to remove the noisy, irrelevant and redundant data before proceeding with classification because many traditional algorithms fail in the presence of such noisy and irrelevant data (Blum and Langley 1997). As an example, consider microarray gene expression data where there are thousands of features (or genes) and only 10s of tuples (or sample tests). For example, Leukemia cancer data (Alon, Barkai et al. 1999) has 7129 genes and 72 sample tests. It has been shown that even with very few genes one can achieve the same or even better prediction acABStrAct","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Simultaneous Feature Selection and Tuple Selection for Efficient Classification\",\"authors\":\"M. Dash, V. Gopalkrishnan\",\"doi\":\"10.4018/978-1-60566-748-5.CH012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is no longer news that data are increasing very rapidly day-by-day. Particularly with Internet becoming so prevalent everywhere, the sources of data have become numerous. Data are increasing in both ways: dimensions or features and instances or examples or tuples, not all the data are relevant though. While gathering the data on any particular aspect, usually one tends to gather as much information as will be required for various tasks. One may not explicitly have any particular task, for example classification, in mind. So, it behooves for a data mining expert to remove the noisy, irrelevant and redundant data before proceeding with classification because many traditional algorithms fail in the presence of such noisy and irrelevant data (Blum and Langley 1997). As an example, consider microarray gene expression data where there are thousands of features (or genes) and only 10s of tuples (or sample tests). For example, Leukemia cancer data (Alon, Barkai et al. 1999) has 7129 genes and 72 sample tests. It has been shown that even with very few genes one can achieve the same or even better prediction acABStrAct\",\"PeriodicalId\":255230,\"journal\":{\"name\":\"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/978-1-60566-748-5.CH012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/978-1-60566-748-5.CH012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

数据每天都在快速增长,这已经不是新闻了。特别是随着互联网变得无处不在,数据的来源变得越来越多。数据以两种方式增加:维度或特征,实例或示例或元组,但并非所有数据都是相关的。在收集任何特定方面的数据时,通常倾向于收集各种任务所需的尽可能多的信息。人们可能没有明确地想到任何特定的任务,例如分类。因此,在进行分类之前,数据挖掘专家应该先去除有噪声的、不相关的和冗余的数据,因为许多传统算法在存在这些有噪声的和不相关的数据时失败(Blum和Langley 1997)。例如,考虑微阵列基因表达数据,其中有数千个特征(或基因),而只有10个元组(或样本测试)。例如,白血病数据(Alon, Barkai et al. 1999)有7129个基因和72个样本测试。研究表明,即使只有很少的基因,人们也能达到相同甚至更好的预测
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Simultaneous Feature Selection and Tuple Selection for Efficient Classification
It is no longer news that data are increasing very rapidly day-by-day. Particularly with Internet becoming so prevalent everywhere, the sources of data have become numerous. Data are increasing in both ways: dimensions or features and instances or examples or tuples, not all the data are relevant though. While gathering the data on any particular aspect, usually one tends to gather as much information as will be required for various tasks. One may not explicitly have any particular task, for example classification, in mind. So, it behooves for a data mining expert to remove the noisy, irrelevant and redundant data before proceeding with classification because many traditional algorithms fail in the presence of such noisy and irrelevant data (Blum and Langley 1997). As an example, consider microarray gene expression data where there are thousands of features (or genes) and only 10s of tuples (or sample tests). For example, Leukemia cancer data (Alon, Barkai et al. 1999) has 7129 genes and 72 sample tests. It has been shown that even with very few genes one can achieve the same or even better prediction acABStrAct
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Ranking Gradients in Multi-Dimensional Spaces An Approximate Approach for Maintaining Recent Occurrences of Itemsets in a Sliding Window over Data Streams Learning Cost-Sensitive Decision Trees to Support Medical Diagnosis The LBF R-Tree Simultaneous Feature Selection and Tuple Selection for Efficient Classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1