特征选择方法对信用风险分类算法性能的影响

2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT) Pub Date : 2019-10-01 DOI:10.1109/AICT47866.2019.8981771

N. P. Singh, Devender Singh

{"title":"特征选择方法对信用风险分类算法性能的影响","authors":"N. P. Singh, Devender Singh","doi":"10.1109/AICT47866.2019.8981771","DOIUrl":null,"url":null,"abstract":"This paper presents ensembling of filter features selection algorithms for classification problem in the context of assessment of risk of credit for a financial institution. Feature selection is one of the most important aspect of data mining, and machine learning algorithm. The main objective of feature selection is reduction in computing resources, reduction in future data collection cost, reducing complexities of the model, avoiding overfitting, and increasing the performance of machine learning algorithms. In this paper the set of available variables are firstly reduced using filter feature selection methods such as chi- square, gain ratio, information gain, relief F, and symmetric uncertainly. In addition, ensemble feature selection of the input variables based on these individual methods is also used. The impact of feature selection is measured by fitting seven classification algorithms, i.e., Random Forest, C4.5, PART, C5.0, Bagging, Boosting, and SVM Linear. The performance of the models is compared by calculating parameters such as accuracy, sensitivity, specificity, positive predictive values, negatively predictive values, and AUC. The data used is German bank data of 1000 records and 20 features and one target variable","PeriodicalId":329473,"journal":{"name":"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Impact of Feature Selection Methods on the Perfromance of Credit Risk Classification Algorithms\",\"authors\":\"N. P. Singh, Devender Singh\",\"doi\":\"10.1109/AICT47866.2019.8981771\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents ensembling of filter features selection algorithms for classification problem in the context of assessment of risk of credit for a financial institution. Feature selection is one of the most important aspect of data mining, and machine learning algorithm. The main objective of feature selection is reduction in computing resources, reduction in future data collection cost, reducing complexities of the model, avoiding overfitting, and increasing the performance of machine learning algorithms. In this paper the set of available variables are firstly reduced using filter feature selection methods such as chi- square, gain ratio, information gain, relief F, and symmetric uncertainly. In addition, ensemble feature selection of the input variables based on these individual methods is also used. The impact of feature selection is measured by fitting seven classification algorithms, i.e., Random Forest, C4.5, PART, C5.0, Bagging, Boosting, and SVM Linear. The performance of the models is compared by calculating parameters such as accuracy, sensitivity, specificity, positive predictive values, negatively predictive values, and AUC. The data used is German bank data of 1000 records and 20 features and one target variable\",\"PeriodicalId\":329473,\"journal\":{\"name\":\"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICT47866.2019.8981771\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICT47866.2019.8981771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

针对某金融机构信用风险评估的分类问题，提出了一种集成滤波特征选择算法。特征选择是数据挖掘和机器学习算法的一个重要方面。特征选择的主要目标是减少计算资源，降低未来的数据收集成本，降低模型的复杂性，避免过拟合，提高机器学习算法的性能。本文首先利用卡方、增益比、信息增益、宽幅F和对称不确定性等滤波特征选择方法对可用变量集进行缩减。此外，还使用了基于这些单独方法的输入变量的集成特征选择。通过拟合Random Forest、C4.5、PART、C5.0、Bagging、Boosting和SVM Linear 7种分类算法来衡量特征选择的影响。通过计算精度、灵敏度、特异性、阳性预测值、阴性预测值和AUC等参数来比较模型的性能。使用的数据是1000条记录和20个特征和一个目标变量的德国银行数据

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Impact of Feature Selection Methods on the Perfromance of Credit Risk Classification Algorithms

This paper presents ensembling of filter features selection algorithms for classification problem in the context of assessment of risk of credit for a financial institution. Feature selection is one of the most important aspect of data mining, and machine learning algorithm. The main objective of feature selection is reduction in computing resources, reduction in future data collection cost, reducing complexities of the model, avoiding overfitting, and increasing the performance of machine learning algorithms. In this paper the set of available variables are firstly reduced using filter feature selection methods such as chi- square, gain ratio, information gain, relief F, and symmetric uncertainly. In addition, ensemble feature selection of the input variables based on these individual methods is also used. The impact of feature selection is measured by fitting seven classification algorithms, i.e., Random Forest, C4.5, PART, C5.0, Bagging, Boosting, and SVM Linear. The performance of the models is compared by calculating parameters such as accuracy, sensitivity, specificity, positive predictive values, negatively predictive values, and AUC. The data used is German bank data of 1000 records and 20 features and one target variable

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)

自引率

0.00%

发文量