基于神经网络和预训练词嵌入的多语言分词

Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019 Pub Date : 1900-01-01 DOI:10.18653/v1/W19-2716

Mikel Iruskieta, K. Bengoetxea, Aitziber Atutxa Salazar, A. D. Ilarraza

{"title":"基于神经网络和预训练词嵌入的多语言分词","authors":"Mikel Iruskieta, K. Bengoetxea, Aitziber Atutxa Salazar, A. D. Ilarraza","doi":"10.18653/v1/W19-2716","DOIUrl":null,"url":null,"abstract":"The DISPRT 2019 workshop has organized a shared task aiming to identify cross-formalism and multilingual discourse segments. Elementary Discourse Units (EDUs) are quite similar across different theories. Segmentation is the very first stage on the way of rhetorical annotation. Still, each annotation project adopted several decisions with consequences not only on the annotation of the relational discourse structure but also at the segmentation stage. In this shared task, we have employed pre-trained word embeddings, neural networks (BiLSTM+CRF) to perform the segmentation. We report F1 results for 6 languages: Basque (0.853), English (0.919), French (0.907), German (0.913), Portuguese (0.926) and Spanish (0.868 and 0.769). Finally, we also pursued an error analysis based on clause typology for Basque and Spanish, in order to understand the performance of the segmenter.","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Multilingual segmentation based on neural networks and pre-trained word embeddings\",\"authors\":\"Mikel Iruskieta, K. Bengoetxea, Aitziber Atutxa Salazar, A. D. Ilarraza\",\"doi\":\"10.18653/v1/W19-2716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The DISPRT 2019 workshop has organized a shared task aiming to identify cross-formalism and multilingual discourse segments. Elementary Discourse Units (EDUs) are quite similar across different theories. Segmentation is the very first stage on the way of rhetorical annotation. Still, each annotation project adopted several decisions with consequences not only on the annotation of the relational discourse structure but also at the segmentation stage. In this shared task, we have employed pre-trained word embeddings, neural networks (BiLSTM+CRF) to perform the segmentation. We report F1 results for 6 languages: Basque (0.853), English (0.919), French (0.907), German (0.913), Portuguese (0.926) and Spanish (0.868 and 0.769). Finally, we also pursued an error analysis based on clause typology for Basque and Spanish, in order to understand the performance of the segmenter.\",\"PeriodicalId\":243254,\"journal\":{\"name\":\"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/W19-2716\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W19-2716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

DISPRT 2019研讨会组织了一项共同任务，旨在确定跨形式主义和多语言话语段。在不同的理论中，基本话语单位(edu)是非常相似的。分词是修辞注释的第一阶段。尽管如此，每个注释项目都采用了几个决策，这些决策不仅对关系话语结构的注释产生了影响，而且对分割阶段也产生了影响。在这个共享任务中，我们使用了预训练的词嵌入，神经网络(BiLSTM+CRF)来执行分割。我们报告了6种语言的F1结果:巴斯克语(0.853)、英语(0.919)、法语(0.907)、德语(0.913)、葡萄牙语(0.926)和西班牙语(0.868和0.769)。最后，我们还进行了基于巴斯克语和西班牙语从句类型的错误分析，以了解分词器的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multilingual segmentation based on neural networks and pre-trained word embeddings

The DISPRT 2019 workshop has organized a shared task aiming to identify cross-formalism and multilingual discourse segments. Elementary Discourse Units (EDUs) are quite similar across different theories. Segmentation is the very first stage on the way of rhetorical annotation. Still, each annotation project adopted several decisions with consequences not only on the annotation of the relational discourse structure but also at the segmentation stage. In this shared task, we have employed pre-trained word embeddings, neural networks (BiLSTM+CRF) to perform the segmentation. We report F1 results for 6 languages: Basque (0.853), English (0.919), French (0.907), German (0.913), Portuguese (0.926) and Spanish (0.868 and 0.769). Finally, we also pursued an error analysis based on clause typology for Basque and Spanish, in order to understand the performance of the segmenter.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

自引率

0.00%

发文量

期刊最新文献

Towards the Data-driven System for Rhetorical Parsing of Russian Texts Nuclearity in RST and signals of coherence relations ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents Toward Cross-theory Discourse Relation Annotation Towards discourse annotation and sentiment analysis of the Basque Opinion Corpus