{"title":"Effective Products Categorization with Importance Scores and Morphological Analysis of the Titles","authors":"Leonidas Akritidis, Athanasios Fevgas, Panayiotis Bozanis","doi":"10.1109/ICTAI.2018.00041","DOIUrl":null,"url":null,"abstract":"During the past few years, the e-commerce platforms and marketplaces have enriched their services with new features to improve their user experience and increase their profitability. Such features include relevant products suggestion, personalized recommendations, query understanding algorithms and numerous others. To effectively implement all these features, a robust products categorization method is required. Due to its importance, the problem of the automatic products classification into a given taxonomy has attracted the attention of multiple researchers. In the current literature, we encounter a broad variety of solutions, ranging from supervised and deep learning algorithms, as well as convolutional and recurrent neural networks. In this paper we introduce a supervised learning method which performs morphological analysis of the product titles by extracting and processing a combination of words and n-grams. In the sequel, each of these tokens receives an importance score according to several criteria which reflect the strength of the correlation of the token with a category. Based on these importance scores, we also propose a dimensionality reduction technique to reduce the size of the feature space without sacrificing much of the performance of the algorithm. The experimental evaluation of our method was conducted by using a real-world dataset, comprised of approximately 320 thousand product titles, which we acquired by crawling a product comparison Web platform. The results of this evaluation indicate that our approach is highly accurate, since it achieves a remarkable classification accuracy of over 95%.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"178 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2018.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
During the past few years, the e-commerce platforms and marketplaces have enriched their services with new features to improve their user experience and increase their profitability. Such features include relevant products suggestion, personalized recommendations, query understanding algorithms and numerous others. To effectively implement all these features, a robust products categorization method is required. Due to its importance, the problem of the automatic products classification into a given taxonomy has attracted the attention of multiple researchers. In the current literature, we encounter a broad variety of solutions, ranging from supervised and deep learning algorithms, as well as convolutional and recurrent neural networks. In this paper we introduce a supervised learning method which performs morphological analysis of the product titles by extracting and processing a combination of words and n-grams. In the sequel, each of these tokens receives an importance score according to several criteria which reflect the strength of the correlation of the token with a category. Based on these importance scores, we also propose a dimensionality reduction technique to reduce the size of the feature space without sacrificing much of the performance of the algorithm. The experimental evaluation of our method was conducted by using a real-world dataset, comprised of approximately 320 thousand product titles, which we acquired by crawling a product comparison Web platform. The results of this evaluation indicate that our approach is highly accurate, since it achieves a remarkable classification accuracy of over 95%.