Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages

Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP Pub Date : 1900-01-01 DOI:10.18653/v1/2022.sigtyp-1.4

Yulia Otmakhova, Karin M. Verspoor, Jey Han Lau

引用次数: 3

Abstract

Though recently there have been an increased interest in how pre-trained language models encode different linguistic features, there is still a lack of systematic comparison between languages with different morphology and syntax. In this paper, using BERT as an example of a pre-trained model, we compare how three typologically different languages (English, Korean, and Russian) encode morphology and syntax features across different layers. In particular, we contrast languages which differ in a particular aspect, such as flexibility of word order, head directionality, morphological type, presence of grammatical gender, and morphological richness, across four different tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

不同类型语言BERT模型中语言特征编码的跨语言比较

尽管最近人们对预训练语言模型如何编码不同的语言特征越来越感兴趣，但仍然缺乏对具有不同形态和语法的语言进行系统比较。在本文中，使用BERT作为预训练模型的示例，我们比较了三种不同类型的语言(英语，韩语和俄语)如何跨不同层编码形态学和语法特征。特别地，我们在四个不同的任务中对比了在特定方面不同的语言，如词序的灵活性、词头的方向性、形态类型、语法性别的存在和形态丰富度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

自引率

0.00%

发文量