Rambod Rahmani, Marco Parola, Mario G. C. A. Cimino
{"title":"A machine learning workflow to address credit default prediction","authors":"Rambod Rahmani, Marco Parola, Mario G. C. A. Cimino","doi":"arxiv-2403.03785","DOIUrl":null,"url":null,"abstract":"Due to the recent increase in interest in Financial Technology (FinTech),\napplications like credit default prediction (CDP) are gaining significant\nindustrial and academic attention. In this regard, CDP plays a crucial role in\nassessing the creditworthiness of individuals and businesses, enabling lenders\nto make informed decisions regarding loan approvals and risk management. In\nthis paper, we propose a workflow-based approach to improve CDP, which refers\nto the task of assessing the probability that a borrower will default on his or\nher credit obligations. The workflow consists of multiple steps, each designed\nto leverage the strengths of different techniques featured in machine learning\npipelines and, thus best solve the CDP task. We employ a comprehensive and\nsystematic approach starting with data preprocessing using Weight of Evidence\nencoding, a technique that ensures in a single-shot data scaling by removing\noutliers, handling missing values, and making data uniform for models working\nwith different data types. Next, we train several families of learning models,\nintroducing ensemble techniques to build more robust models and hyperparameter\noptimization via multi-objective genetic algorithms to consider both predictive\naccuracy and financial aspects. Our research aims at contributing to the\nFinTech industry in providing a tool to move toward more accurate and reliable\ncredit risk assessment, benefiting both lenders and borrowers.","PeriodicalId":501128,"journal":{"name":"arXiv - QuantFin - Risk Management","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Risk Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2403.03785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Due to the recent increase in interest in Financial Technology (FinTech),
applications like credit default prediction (CDP) are gaining significant
industrial and academic attention. In this regard, CDP plays a crucial role in
assessing the creditworthiness of individuals and businesses, enabling lenders
to make informed decisions regarding loan approvals and risk management. In
this paper, we propose a workflow-based approach to improve CDP, which refers
to the task of assessing the probability that a borrower will default on his or
her credit obligations. The workflow consists of multiple steps, each designed
to leverage the strengths of different techniques featured in machine learning
pipelines and, thus best solve the CDP task. We employ a comprehensive and
systematic approach starting with data preprocessing using Weight of Evidence
encoding, a technique that ensures in a single-shot data scaling by removing
outliers, handling missing values, and making data uniform for models working
with different data types. Next, we train several families of learning models,
introducing ensemble techniques to build more robust models and hyperparameter
optimization via multi-objective genetic algorithms to consider both predictive
accuracy and financial aspects. Our research aims at contributing to the
FinTech industry in providing a tool to move toward more accurate and reliable
credit risk assessment, benefiting both lenders and borrowers.