构建语义技术的全局字典

Eszter Iklódi, Gábor Recski, Gábor Borbély, María José Castro Bleda
{"title":"构建语义技术的全局字典","authors":"Eszter Iklódi, Gábor Recski, Gábor Borbély, María José Castro Bleda","doi":"10.21437/IBERSPEECH.2018-60","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel method for finding linear mappings among word vectors for various languages. Compared to previous approaches, this method does not learn translation matrices between two specific languages, but between a given language and a shared, universal space. The system was trained in two different modes, first between two languages, and after that applying three languages at the same time. In the first case two different training data were applied; Dinu’s English-Italian benchmark data [1], and English-Italian translation pairs extracted from the PanLex database [2]. In the second case only the PanLex database was used. The system performs on English-Italian languages with the best setting significantly better than the baseline system of Mikolov et al. [3], and it provides a comparable performance with the more sophisticated systems of Faruqui and Dyer [4] and Dinu et al. [1]. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number languages.","PeriodicalId":115963,"journal":{"name":"IberSPEECH Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Building a global dictionary for semantic technologies\",\"authors\":\"Eszter Iklódi, Gábor Recski, Gábor Borbély, María José Castro Bleda\",\"doi\":\"10.21437/IBERSPEECH.2018-60\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a novel method for finding linear mappings among word vectors for various languages. Compared to previous approaches, this method does not learn translation matrices between two specific languages, but between a given language and a shared, universal space. The system was trained in two different modes, first between two languages, and after that applying three languages at the same time. In the first case two different training data were applied; Dinu’s English-Italian benchmark data [1], and English-Italian translation pairs extracted from the PanLex database [2]. In the second case only the PanLex database was used. The system performs on English-Italian languages with the best setting significantly better than the baseline system of Mikolov et al. [3], and it provides a comparable performance with the more sophisticated systems of Faruqui and Dyer [4] and Dinu et al. [1]. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number languages.\",\"PeriodicalId\":115963,\"journal\":{\"name\":\"IberSPEECH Conference\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IberSPEECH Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/IBERSPEECH.2018-60\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IberSPEECH Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/IBERSPEECH.2018-60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文提出了一种寻找不同语言词向量间线性映射的新方法。与之前的方法相比,该方法不学习两种特定语言之间的翻译矩阵,而是学习给定语言和共享的通用空间之间的翻译矩阵。该系统以两种不同的模式进行训练,首先在两种语言之间进行训练,然后同时使用三种语言进行训练。在第一种情况下,使用了两个不同的训练数据;Dinu的英语-意大利语基准数据[1]和从PanLex数据库中提取的英语-意大利语翻译对[2]。在第二种情况下,只使用了PanLex数据库。该系统在最佳设置下对英语-意大利语的表现明显优于Mikolov等人[3]的基线系统,并与Faruqui和Dyer[4]以及Dinu等人[1]的更复杂的系统提供相当的性能。该方法利用PanLex数据库的丰富性,使学习任意数量语言之间的线性映射成为可能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Building a global dictionary for semantic technologies
This paper proposes a novel method for finding linear mappings among word vectors for various languages. Compared to previous approaches, this method does not learn translation matrices between two specific languages, but between a given language and a shared, universal space. The system was trained in two different modes, first between two languages, and after that applying three languages at the same time. In the first case two different training data were applied; Dinu’s English-Italian benchmark data [1], and English-Italian translation pairs extracted from the PanLex database [2]. In the second case only the PanLex database was used. The system performs on English-Italian languages with the best setting significantly better than the baseline system of Mikolov et al. [3], and it provides a comparable performance with the more sophisticated systems of Faruqui and Dyer [4] and Dinu et al. [1]. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number languages.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Recurrent Neural Network Approach to Audio Segmentation for Broadcast Domain Data The Intelligent Voice System for the IberSPEECH-RTVE 2018 Speaker Diarization Challenge AUDIAS-CEU: A Language-independent approach for the Query-by-Example Spoken Term Detection task of the Search on Speech ALBAYZIN 2018 evaluation The GTM-UVIGO System for Audiovisual Diarization Baseline Acoustic Models for Brazilian Portuguese Using Kaldi Tools
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1