Evolutionary-based feature construction with substitution for data summarization using DARA

F. Sia, R. Alfred
{"title":"Evolutionary-based feature construction with substitution for data summarization using DARA","authors":"F. Sia, R. Alfred","doi":"10.1109/DMO.2012.6329798","DOIUrl":null,"url":null,"abstract":"The representation of input data set is important for learning task. In data summarization, the representation of the multi-instances stored in non-target tables that have many-to-one relationship with record stored in target table influences the descriptive accuracy of the summarized data. If the summarized data is fed into a classifier as one of the input features, the predictive accuracy of the classifier will also be affected. This paper proposes an evolutionary-based feature construction approach namely Fixed-Length Feature Construction with Substitution (FLFCWS) to address the problem by means of optimizing the feature construction for relational data summarization. This approach allows initial features to be used more than once in constructing newly constructed features. This is performed in order to exploit all possible interactions among attributes which involves an application of genetic algorithm to find a relevant set of features. The constructed features will be used to generate relevant patterns that characterize non-target records associated to the target record as an input representation for data summarization process. Several feature scoring measures are used as fitness function to find the best set of constructed features. The experimental results show that there is an improvement of predictive accuracy for classifying data summarized based on FLFCWS approach which indirectly improves the descriptive accuracy of the summarized data. It shows that FLFCWS approach can generate promising set of constructed features to describe the characteristics of non-target records for data summarization.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th Conference on Data Mining and Optimization (DMO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMO.2012.6329798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

The representation of input data set is important for learning task. In data summarization, the representation of the multi-instances stored in non-target tables that have many-to-one relationship with record stored in target table influences the descriptive accuracy of the summarized data. If the summarized data is fed into a classifier as one of the input features, the predictive accuracy of the classifier will also be affected. This paper proposes an evolutionary-based feature construction approach namely Fixed-Length Feature Construction with Substitution (FLFCWS) to address the problem by means of optimizing the feature construction for relational data summarization. This approach allows initial features to be used more than once in constructing newly constructed features. This is performed in order to exploit all possible interactions among attributes which involves an application of genetic algorithm to find a relevant set of features. The constructed features will be used to generate relevant patterns that characterize non-target records associated to the target record as an input representation for data summarization process. Several feature scoring measures are used as fitness function to find the best set of constructed features. The experimental results show that there is an improvement of predictive accuracy for classifying data summarized based on FLFCWS approach which indirectly improves the descriptive accuracy of the summarized data. It shows that FLFCWS approach can generate promising set of constructed features to describe the characteristics of non-target records for data summarization.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于进化的特征构建替代了基于DARA的数据摘要
输入数据集的表示是学习任务的重要组成部分。在数据汇总中,存储在非目标表中与目标表中存储的记录具有多对一关系的多实例的表示方式会影响汇总数据的描述准确性。如果将汇总的数据作为输入特征之一馈送到分类器中,也会影响分类器的预测精度。本文提出了一种基于进化的特征构建方法——固定长度替换特征构建(FLFCWS),通过优化关系数据摘要的特征构建来解决这一问题。这种方法允许在构造新构造的特征时多次使用初始特征。这样做是为了利用属性之间的所有可能的相互作用,这涉及到应用遗传算法来找到一组相关的特征。构建的特性将用于生成相关模式,这些模式描述与目标记录相关联的非目标记录,作为数据汇总过程的输入表示。使用几个特征评分度量作为适应度函数来寻找构造的最佳特征集。实验结果表明,基于FLFCWS方法的汇总数据分类预测精度得到了提高,间接提高了汇总数据的描述精度。结果表明,FLFCWS方法可以生成有希望的构造特征集来描述非目标记录的特征,用于数据汇总。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Spatial and temporal analysis of deforestation and forest degradation in Selangor: Implication to carbon stock above ground Fuzzy rule-based for predicting machining performance for SNTR carbide in milling titanium alloy (Ti-6Al-4v) A feature selection model for binary classification of imbalanced data based on preference for target instances WebSum: Enhanced SumBasic algorithm for Web site summarization Meaningless to meaningful Web log data for generation of Web pre-caching decision rules using Rough Set
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1