{"title":"Guided Self-Training based Semi-Supervised Learning for Fraud Detection","authors":"Awanish Kumar, Soumyadeep Ghosh, Janu Verma","doi":"10.1145/3533271.3561783","DOIUrl":null,"url":null,"abstract":"Semi supervised learning has attracted attention of AI researchers in the recent past, especially after the advent of deep learning methods and their success in several real world applications. Most deep learning models require large amounts of labelled data, which is expensive to obtain. Fraud detection is a very important problem for several industries and large amount of data is often available. However, obtaining labelled data is cumbersome and hence semi-supervised learning is perfectly positioned to aid us in building robust and accurate supervised models. In this work, we consider different kinds of fraud detection paradigms and show that a self-training based semi-supervised learning approach can produce significant improvements over a model that has been training on a limited set of labelled data. We propose a novel self-training approach by using a guided sharpening technique using a pair of autoencoders which provide useful cues for incorporating unlabelled data in the training process. We conduct thorough experiments on three different real world databases and analysis to showcase the effectiveness of the approach. On the elliptic bitcoin fraud dataset, we show that utilizing unlabelled data improves the F1 score of the model trained on limited labelled data by around 10%.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third ACM International Conference on AI in Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533271.3561783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Semi supervised learning has attracted attention of AI researchers in the recent past, especially after the advent of deep learning methods and their success in several real world applications. Most deep learning models require large amounts of labelled data, which is expensive to obtain. Fraud detection is a very important problem for several industries and large amount of data is often available. However, obtaining labelled data is cumbersome and hence semi-supervised learning is perfectly positioned to aid us in building robust and accurate supervised models. In this work, we consider different kinds of fraud detection paradigms and show that a self-training based semi-supervised learning approach can produce significant improvements over a model that has been training on a limited set of labelled data. We propose a novel self-training approach by using a guided sharpening technique using a pair of autoencoders which provide useful cues for incorporating unlabelled data in the training process. We conduct thorough experiments on three different real world databases and analysis to showcase the effectiveness of the approach. On the elliptic bitcoin fraud dataset, we show that utilizing unlabelled data improves the F1 score of the model trained on limited labelled data by around 10%.