Improving Cuneiform Language Identification with BERT

Proceedings of the Sixth Workshop on Pub Date : 1900-01-01 DOI:10.18653/v1/W19-1402

Gabriel Bernier-Colborne, Cyril Goutte, Serge Léger

引用次数: 20

Abstract

We describe the systems developed by the National Research Council Canada for the Cuneiform Language Identification (CLI) shared task at the 2019 VarDial evaluation campaign. We compare a state-of-the-art baseline relying on character n-grams and a traditional statistical classifier, a voting ensemble of classifiers, and a deep learning approach using a Transformer network. We describe how these systems were trained, and analyze the impact of some preprocessing and model estimation decisions. The deep neural network achieved 77% accuracy on the test data, which turned out to be the best performance at the CLI evaluation, establishing a new state-of-the-art for cuneiform language identification.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用BERT改进楔形文字识别

我们描述了加拿大国家研究委员会在2019年VarDial评估活动中为楔形文字识别(CLI)共享任务开发的系统。我们比较了基于字符n-图的最先进的基线和传统的统计分类器、分类器的投票集合和使用Transformer网络的深度学习方法。我们描述了这些系统是如何训练的，并分析了一些预处理和模型估计决策的影响。深度神经网络在测试数据上达到了77%的准确率，这在CLI评估中被证明是最好的表现，为楔形文字识别建立了新的技术水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Sixth Workshop on

自引率

0.00%

发文量

期刊最新文献

Joint Approach to Deromanization of Code-mixed Texts Cross-lingual Annotation Projection Is Effective for Neural Part-of-Speech Tagging TwistBytes - Identification of Cuneiform Languages and German Dialects at VarDial 2019 Ensemble Methods to Distinguish Mainland and Taiwan Chinese A Report on the Third VarDial Evaluation Campaign