特征选择方法对信用风险分类算法性能的影响

N. P. Singh, Devender Singh
{"title":"特征选择方法对信用风险分类算法性能的影响","authors":"N. P. Singh, Devender Singh","doi":"10.1109/AICT47866.2019.8981771","DOIUrl":null,"url":null,"abstract":"This paper presents ensembling of filter features selection algorithms for classification problem in the context of assessment of risk of credit for a financial institution. Feature selection is one of the most important aspect of data mining, and machine learning algorithm. The main objective of feature selection is reduction in computing resources, reduction in future data collection cost, reducing complexities of the model, avoiding overfitting, and increasing the performance of machine learning algorithms. In this paper the set of available variables are firstly reduced using filter feature selection methods such as chi- square, gain ratio, information gain, relief F, and symmetric uncertainly. In addition, ensemble feature selection of the input variables based on these individual methods is also used. The impact of feature selection is measured by fitting seven classification algorithms, i.e., Random Forest, C4.5, PART, C5.0, Bagging, Boosting, and SVM Linear. The performance of the models is compared by calculating parameters such as accuracy, sensitivity, specificity, positive predictive values, negatively predictive values, and AUC. The data used is German bank data of 1000 records and 20 features and one target variable","PeriodicalId":329473,"journal":{"name":"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Impact of Feature Selection Methods on the Perfromance of Credit Risk Classification Algorithms\",\"authors\":\"N. P. Singh, Devender Singh\",\"doi\":\"10.1109/AICT47866.2019.8981771\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents ensembling of filter features selection algorithms for classification problem in the context of assessment of risk of credit for a financial institution. Feature selection is one of the most important aspect of data mining, and machine learning algorithm. The main objective of feature selection is reduction in computing resources, reduction in future data collection cost, reducing complexities of the model, avoiding overfitting, and increasing the performance of machine learning algorithms. In this paper the set of available variables are firstly reduced using filter feature selection methods such as chi- square, gain ratio, information gain, relief F, and symmetric uncertainly. In addition, ensemble feature selection of the input variables based on these individual methods is also used. The impact of feature selection is measured by fitting seven classification algorithms, i.e., Random Forest, C4.5, PART, C5.0, Bagging, Boosting, and SVM Linear. The performance of the models is compared by calculating parameters such as accuracy, sensitivity, specificity, positive predictive values, negatively predictive values, and AUC. The data used is German bank data of 1000 records and 20 features and one target variable\",\"PeriodicalId\":329473,\"journal\":{\"name\":\"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICT47866.2019.8981771\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICT47866.2019.8981771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

针对某金融机构信用风险评估的分类问题,提出了一种集成滤波特征选择算法。特征选择是数据挖掘和机器学习算法的一个重要方面。特征选择的主要目标是减少计算资源,降低未来的数据收集成本,降低模型的复杂性,避免过拟合,提高机器学习算法的性能。本文首先利用卡方、增益比、信息增益、宽幅F和对称不确定性等滤波特征选择方法对可用变量集进行缩减。此外,还使用了基于这些单独方法的输入变量的集成特征选择。通过拟合Random Forest、C4.5、PART、C5.0、Bagging、Boosting和SVM Linear 7种分类算法来衡量特征选择的影响。通过计算精度、灵敏度、特异性、阳性预测值、阴性预测值和AUC等参数来比较模型的性能。使用的数据是1000条记录和20个特征和一个目标变量的德国银行数据
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Impact of Feature Selection Methods on the Perfromance of Credit Risk Classification Algorithms
This paper presents ensembling of filter features selection algorithms for classification problem in the context of assessment of risk of credit for a financial institution. Feature selection is one of the most important aspect of data mining, and machine learning algorithm. The main objective of feature selection is reduction in computing resources, reduction in future data collection cost, reducing complexities of the model, avoiding overfitting, and increasing the performance of machine learning algorithms. In this paper the set of available variables are firstly reduced using filter feature selection methods such as chi- square, gain ratio, information gain, relief F, and symmetric uncertainly. In addition, ensemble feature selection of the input variables based on these individual methods is also used. The impact of feature selection is measured by fitting seven classification algorithms, i.e., Random Forest, C4.5, PART, C5.0, Bagging, Boosting, and SVM Linear. The performance of the models is compared by calculating parameters such as accuracy, sensitivity, specificity, positive predictive values, negatively predictive values, and AUC. The data used is German bank data of 1000 records and 20 features and one target variable
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Geometric fractal index as a tool of the time series analysis Facial Emotion Recognition using Convolutional Neural Networks Algorithm Diagnosis of Anemia on the basis of the Method of the Synthesis of the Decisive Rules How to Design Dialogue Scenarios and Estimate Main Dialogue Parameters for a Voice-Controlled Man-Machine Interface A Conceptual Model of an Intelligent Platform for Security Risk Assessment in SMEs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1