Ivan Malashin, Igor Masich, Vadim Tynchenko, Vladimir Nelyub, Aleksei Borodulin, Andrei Gantimurov
{"title":"Application of Natural Language Processing and Genetic Algorithm to Fine-Tune Hyperparameters of Classifiers for Economic Activities Analysis","authors":"Ivan Malashin, Igor Masich, Vadim Tynchenko, Vladimir Nelyub, Aleksei Borodulin, Andrei Gantimurov","doi":"10.3390/bdcc8060068","DOIUrl":null,"url":null,"abstract":"This study proposes a method for classifying economic activity descriptors to match Nomenclature of Economic Activities (NACE) codes, employing a blend of machine learning techniques and expert evaluation. By leveraging natural language processing (NLP) methods to vectorize activity descriptors and utilizing genetic algorithm (GA) optimization to fine-tune hyperparameters in multi-class classifiers like Naive Bayes, Decision Trees, Random Forests, and Multilayer Perceptrons, our aim is to boost the accuracy and reliability of an economic classification system. This system faces challenges due to the absence of precise target labels in the dataset. Hence, it is essential to initially check the accuracy of utilized methods based on expert evaluations using a small dataset before generalizing to a larger one.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data and Cognitive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/bdcc8060068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study proposes a method for classifying economic activity descriptors to match Nomenclature of Economic Activities (NACE) codes, employing a blend of machine learning techniques and expert evaluation. By leveraging natural language processing (NLP) methods to vectorize activity descriptors and utilizing genetic algorithm (GA) optimization to fine-tune hyperparameters in multi-class classifiers like Naive Bayes, Decision Trees, Random Forests, and Multilayer Perceptrons, our aim is to boost the accuracy and reliability of an economic classification system. This system faces challenges due to the absence of precise target labels in the dataset. Hence, it is essential to initially check the accuracy of utilized methods based on expert evaluations using a small dataset before generalizing to a larger one.