{"title":"Designing an algorithm for annotating Russian-language text data of social media using transfer learning","authors":"D.S. Bakanov, A.V. Kupriyanov","doi":"10.1109/ITNT57377.2023.10139023","DOIUrl":null,"url":null,"abstract":"This article considers ways to build an algorithm for annotating Russian-language texts from social media. Annotation will be defined as the estimation of the emotional coloring of the text. The article addresses both classical basic methods of statistical learning and modern methods of deep learning based on transfer learning and transformers. The main problem in solving the problem of determining the sentiment of Russian-language texts is the lack of a large corpus of labeled data, which severely limits the training of the model. In conclusion, a model that combines the transformer model and gradient boosting will be developed. The relevance of this work is to create a model with low memory consumption and thematic independence of posts, trained on a small amount of data, which can be used to analyze the textual content of posts in social media.","PeriodicalId":296438,"journal":{"name":"2023 IX International Conference on Information Technology and Nanotechnology (ITNT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IX International Conference on Information Technology and Nanotechnology (ITNT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITNT57377.2023.10139023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This article considers ways to build an algorithm for annotating Russian-language texts from social media. Annotation will be defined as the estimation of the emotional coloring of the text. The article addresses both classical basic methods of statistical learning and modern methods of deep learning based on transfer learning and transformers. The main problem in solving the problem of determining the sentiment of Russian-language texts is the lack of a large corpus of labeled data, which severely limits the training of the model. In conclusion, a model that combines the transformer model and gradient boosting will be developed. The relevance of this work is to create a model with low memory consumption and thematic independence of posts, trained on a small amount of data, which can be used to analyze the textual content of posts in social media.