{"title":"English-Chinese Cross Language Word Embedding Similarity Calculation","authors":"Like Wang, Yuan Sun, Xiaobing Zhao","doi":"10.1145/3299819.3299831","DOIUrl":null,"url":null,"abstract":"Differences in languages among various countries, regions, and nationalities have created huge obstacles in communication. Cross-language word similarity (CLWS) calculation is the most practical way to solve this problem. Selection of corpus is one of the factors that influence the calculate result. This paper compares the similarity in word embeddings of bilingual parallel and non-parallel corpus on traditional models. Firstly, this paper uses the fastText method to calculate the monolingual word embeddings of Chinese and English, and computes the semantic similarity between the two embeddings. Then it maps the word embeddings into an implicit shared space using Multilingual Unsupervised and Supervised Embedding (MUSE), and compares the effect of unsupervised and supervised machine learning methods in parallel and non-parallel corpus. Finally, the experimental results prove that MUSE model could be better align monolingual word embeddings space, non-parallel corpus have the same effect compares with parallel corpus in calculating the CLWS.","PeriodicalId":119217,"journal":{"name":"Artificial Intelligence and Cloud Computing Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence and Cloud Computing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3299819.3299831","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Differences in languages among various countries, regions, and nationalities have created huge obstacles in communication. Cross-language word similarity (CLWS) calculation is the most practical way to solve this problem. Selection of corpus is one of the factors that influence the calculate result. This paper compares the similarity in word embeddings of bilingual parallel and non-parallel corpus on traditional models. Firstly, this paper uses the fastText method to calculate the monolingual word embeddings of Chinese and English, and computes the semantic similarity between the two embeddings. Then it maps the word embeddings into an implicit shared space using Multilingual Unsupervised and Supervised Embedding (MUSE), and compares the effect of unsupervised and supervised machine learning methods in parallel and non-parallel corpus. Finally, the experimental results prove that MUSE model could be better align monolingual word embeddings space, non-parallel corpus have the same effect compares with parallel corpus in calculating the CLWS.