JHU System Description for the MADAR Arabic Dialect Identification Shared Task

WANLP@ACL 2019 Pub Date : 2019-08-01 DOI:10.18653/v1/W19-4634

Thomas Lippincott, Pamela Shapiro, Kevin Duh, Paul McNamee

引用次数: 5

Abstract

Our submission to the MADAR shared task on Arabic dialect identification employed a language modeling technique called Prediction by Partial Matching, an ensemble of neural architectures, and sources of additional data for training word embeddings and auxiliary language models. We found several of these techniques provided small boosts in performance, though a simple character-level language model was a strong baseline, and a lower-order LM achieved best performance on Subtask 2. Interestingly, word embeddings provided no consistent benefit, and ensembling struggled to outperform the best component submodel. This suggests the variety of architectures are learning redundant information, and future work may focus on encouraging decorrelated learning.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MADAR阿拉伯语方言识别共享任务的JHU系统描述

我们提交给MADAR的关于阿拉伯语方言识别的共享任务使用了一种称为“部分匹配预测”的语言建模技术，一个神经架构的集合，以及用于训练词嵌入和辅助语言模型的额外数据来源。我们发现这些技术中有几种在性能上提供了小幅提升，尽管一个简单的字符级语言模型是一个强大的基线，而一个低阶LM在Subtask 2上实现了最佳性能。有趣的是，词嵌入并没有提供一致的好处，而集成很难胜过最好的组件子模型。这表明各种架构都在学习冗余信息，未来的工作可能会集中在鼓励去相关学习上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

WANLP@ACL 2019

自引率

0.00%

发文量