基于神经网络和预训练词嵌入的多语言分词

Mikel Iruskieta, K. Bengoetxea, Aitziber Atutxa Salazar, A. D. Ilarraza
{"title":"基于神经网络和预训练词嵌入的多语言分词","authors":"Mikel Iruskieta, K. Bengoetxea, Aitziber Atutxa Salazar, A. D. Ilarraza","doi":"10.18653/v1/W19-2716","DOIUrl":null,"url":null,"abstract":"The DISPRT 2019 workshop has organized a shared task aiming to identify cross-formalism and multilingual discourse segments. Elementary Discourse Units (EDUs) are quite similar across different theories. Segmentation is the very first stage on the way of rhetorical annotation. Still, each annotation project adopted several decisions with consequences not only on the annotation of the relational discourse structure but also at the segmentation stage. In this shared task, we have employed pre-trained word embeddings, neural networks (BiLSTM+CRF) to perform the segmentation. We report F1 results for 6 languages: Basque (0.853), English (0.919), French (0.907), German (0.913), Portuguese (0.926) and Spanish (0.868 and 0.769). Finally, we also pursued an error analysis based on clause typology for Basque and Spanish, in order to understand the performance of the segmenter.","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Multilingual segmentation based on neural networks and pre-trained word embeddings\",\"authors\":\"Mikel Iruskieta, K. Bengoetxea, Aitziber Atutxa Salazar, A. D. Ilarraza\",\"doi\":\"10.18653/v1/W19-2716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The DISPRT 2019 workshop has organized a shared task aiming to identify cross-formalism and multilingual discourse segments. Elementary Discourse Units (EDUs) are quite similar across different theories. Segmentation is the very first stage on the way of rhetorical annotation. Still, each annotation project adopted several decisions with consequences not only on the annotation of the relational discourse structure but also at the segmentation stage. In this shared task, we have employed pre-trained word embeddings, neural networks (BiLSTM+CRF) to perform the segmentation. We report F1 results for 6 languages: Basque (0.853), English (0.919), French (0.907), German (0.913), Portuguese (0.926) and Spanish (0.868 and 0.769). Finally, we also pursued an error analysis based on clause typology for Basque and Spanish, in order to understand the performance of the segmenter.\",\"PeriodicalId\":243254,\"journal\":{\"name\":\"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/W19-2716\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W19-2716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

DISPRT 2019研讨会组织了一项共同任务,旨在确定跨形式主义和多语言话语段。在不同的理论中,基本话语单位(edu)是非常相似的。分词是修辞注释的第一阶段。尽管如此,每个注释项目都采用了几个决策,这些决策不仅对关系话语结构的注释产生了影响,而且对分割阶段也产生了影响。在这个共享任务中,我们使用了预训练的词嵌入,神经网络(BiLSTM+CRF)来执行分割。我们报告了6种语言的F1结果:巴斯克语(0.853)、英语(0.919)、法语(0.907)、德语(0.913)、葡萄牙语(0.926)和西班牙语(0.868和0.769)。最后,我们还进行了基于巴斯克语和西班牙语从句类型的错误分析,以了解分词器的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multilingual segmentation based on neural networks and pre-trained word embeddings
The DISPRT 2019 workshop has organized a shared task aiming to identify cross-formalism and multilingual discourse segments. Elementary Discourse Units (EDUs) are quite similar across different theories. Segmentation is the very first stage on the way of rhetorical annotation. Still, each annotation project adopted several decisions with consequences not only on the annotation of the relational discourse structure but also at the segmentation stage. In this shared task, we have employed pre-trained word embeddings, neural networks (BiLSTM+CRF) to perform the segmentation. We report F1 results for 6 languages: Basque (0.853), English (0.919), French (0.907), German (0.913), Portuguese (0.926) and Spanish (0.868 and 0.769). Finally, we also pursued an error analysis based on clause typology for Basque and Spanish, in order to understand the performance of the segmenter.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Towards the Data-driven System for Rhetorical Parsing of Russian Texts Nuclearity in RST and signals of coherence relations ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents Toward Cross-theory Discourse Relation Annotation Towards discourse annotation and sentiment analysis of the Basque Opinion Corpus
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1