Maddineni Bhargava, K. Vijayan, Oshin Anand, Gaurav Raina
{"title":"多语言文本分类模型迁移学习能力的探索","authors":"Maddineni Bhargava, K. Vijayan, Oshin Anand, Gaurav Raina","doi":"10.1145/3609703.3609711","DOIUrl":null,"url":null,"abstract":"The use of multilingual models for natural language processing is becoming increasingly popular in industrial and business applications, particularly in multilingual societies. In this study, we investigate the transfer learning capabilities of multilingual language models like mBERT and XLM-R across several Indian languages. We study the performance characteristics of a classifier model with mBERT/XLM-R as the front-end, which is trained only in one language for two tasks: text categorization of news articles and sentiment analysis of product reviews. News articles, on the same event but in different languages, are representative of what may be termed as ‘inherently parallel’ data; i.e. data that exhibits similar content across multiple languages, though not necessarily in parallel sentences. Other examples of such data would be customer inquiries/reviews about the same product, social media activity pertaining to the same topic, etcetera. After training in one language, we study the performance characteristics of this classifier model when applied to other languages. Our experiments reveal that by exploiting the inherently parallel nature of the data, XLM-R performs remarkably well when adapted for any Indian language dataset. Further, our study reveals the importance of simultaneously fine-tuning multilingual models with in-domain data from one language in order to express their cross-lingual and domain transfer learning abilities together.","PeriodicalId":101485,"journal":{"name":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploration of transfer learning capability of multilingual models for text classification\",\"authors\":\"Maddineni Bhargava, K. Vijayan, Oshin Anand, Gaurav Raina\",\"doi\":\"10.1145/3609703.3609711\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of multilingual models for natural language processing is becoming increasingly popular in industrial and business applications, particularly in multilingual societies. In this study, we investigate the transfer learning capabilities of multilingual language models like mBERT and XLM-R across several Indian languages. We study the performance characteristics of a classifier model with mBERT/XLM-R as the front-end, which is trained only in one language for two tasks: text categorization of news articles and sentiment analysis of product reviews. News articles, on the same event but in different languages, are representative of what may be termed as ‘inherently parallel’ data; i.e. data that exhibits similar content across multiple languages, though not necessarily in parallel sentences. Other examples of such data would be customer inquiries/reviews about the same product, social media activity pertaining to the same topic, etcetera. After training in one language, we study the performance characteristics of this classifier model when applied to other languages. Our experiments reveal that by exploiting the inherently parallel nature of the data, XLM-R performs remarkably well when adapted for any Indian language dataset. Further, our study reveals the importance of simultaneously fine-tuning multilingual models with in-domain data from one language in order to express their cross-lingual and domain transfer learning abilities together.\",\"PeriodicalId\":101485,\"journal\":{\"name\":\"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3609703.3609711\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609703.3609711","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploration of transfer learning capability of multilingual models for text classification
The use of multilingual models for natural language processing is becoming increasingly popular in industrial and business applications, particularly in multilingual societies. In this study, we investigate the transfer learning capabilities of multilingual language models like mBERT and XLM-R across several Indian languages. We study the performance characteristics of a classifier model with mBERT/XLM-R as the front-end, which is trained only in one language for two tasks: text categorization of news articles and sentiment analysis of product reviews. News articles, on the same event but in different languages, are representative of what may be termed as ‘inherently parallel’ data; i.e. data that exhibits similar content across multiple languages, though not necessarily in parallel sentences. Other examples of such data would be customer inquiries/reviews about the same product, social media activity pertaining to the same topic, etcetera. After training in one language, we study the performance characteristics of this classifier model when applied to other languages. Our experiments reveal that by exploiting the inherently parallel nature of the data, XLM-R performs remarkably well when adapted for any Indian language dataset. Further, our study reveals the importance of simultaneously fine-tuning multilingual models with in-domain data from one language in order to express their cross-lingual and domain transfer learning abilities together.