Xiaoming Lu, Wenjian Liu, Shengyi Jiang, Changqing Liu
{"title":"多语言BERT跨语言可移植性与预训练的切线表示:一项调查","authors":"Xiaoming Lu, Wenjian Liu, Shengyi Jiang, Changqing Liu","doi":"10.1109/ICNLP58431.2023.00048","DOIUrl":null,"url":null,"abstract":"Natural Language Processing (NLP) systems have three main components including tokenization, embedding, and model architectures (top deep learning models such as BERT, GPT-2, or GPT-3). In this paper, the authors attempt to explore and sum up possible ways of fine-tuning the Multilingual BERT (mBERT) model and feeding it with effective encodings of Tangut characters. Tangut is an extinct low-resource language. We expect to introduce a tailored embedding layer into Tangut as part of the fine-tuning procedure without altering mBERT internal structure. The initial work is listed on. By reviewing existing State of the Art (SOTA) approaches, we hope to further analyze the performance boost of mBERT when applied to low-resource languages.","PeriodicalId":53637,"journal":{"name":"Icon","volume":"14 1","pages":"229-234"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multilingual BERT Cross-Lingual Transferability with Pre-trained Representations on Tangut: A Survey\",\"authors\":\"Xiaoming Lu, Wenjian Liu, Shengyi Jiang, Changqing Liu\",\"doi\":\"10.1109/ICNLP58431.2023.00048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Natural Language Processing (NLP) systems have three main components including tokenization, embedding, and model architectures (top deep learning models such as BERT, GPT-2, or GPT-3). In this paper, the authors attempt to explore and sum up possible ways of fine-tuning the Multilingual BERT (mBERT) model and feeding it with effective encodings of Tangut characters. Tangut is an extinct low-resource language. We expect to introduce a tailored embedding layer into Tangut as part of the fine-tuning procedure without altering mBERT internal structure. The initial work is listed on. By reviewing existing State of the Art (SOTA) approaches, we hope to further analyze the performance boost of mBERT when applied to low-resource languages.\",\"PeriodicalId\":53637,\"journal\":{\"name\":\"Icon\",\"volume\":\"14 1\",\"pages\":\"229-234\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Icon\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNLP58431.2023.00048\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Icon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNLP58431.2023.00048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Arts and Humanities","Score":null,"Total":0}
Multilingual BERT Cross-Lingual Transferability with Pre-trained Representations on Tangut: A Survey
Natural Language Processing (NLP) systems have three main components including tokenization, embedding, and model architectures (top deep learning models such as BERT, GPT-2, or GPT-3). In this paper, the authors attempt to explore and sum up possible ways of fine-tuning the Multilingual BERT (mBERT) model and feeding it with effective encodings of Tangut characters. Tangut is an extinct low-resource language. We expect to introduce a tailored embedding layer into Tangut as part of the fine-tuning procedure without altering mBERT internal structure. The initial work is listed on. By reviewing existing State of the Art (SOTA) approaches, we hope to further analyze the performance boost of mBERT when applied to low-resource languages.