Lutfi Kerem Senel, Veysel Yücesoy, Aykut Koç, T. Çukur
{"title":"Measuring cross-lingual semantic similarity across European languages","authors":"Lutfi Kerem Senel, Veysel Yücesoy, Aykut Koç, T. Çukur","doi":"10.1109/TSP.2017.8076005","DOIUrl":null,"url":null,"abstract":"This paper studies cross-lingual semantic similarity (CLSS) between five European languages (i.e. English, French, German, Spanish and Italian) via unsupervised word embeddings from a cross-lingual lexicon. The vocabulary in each language is projected onto a separate high-dimensional vector space, and these vector spaces are then compared using several different distance measures (i.e., correlation, cosine etc.) to measure their pairwise semantic similarities between these languages. A substantial degree of similarity is observed between the vector spaces learned from corpora of the European languages. Null hypothesis testing and bootstrap methods (by resampling without replacement) are utilized to verify the results.","PeriodicalId":256818,"journal":{"name":"2017 40th International Conference on Telecommunications and Signal Processing (TSP)","volume":"331 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 40th International Conference on Telecommunications and Signal Processing (TSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSP.2017.8076005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
This paper studies cross-lingual semantic similarity (CLSS) between five European languages (i.e. English, French, German, Spanish and Italian) via unsupervised word embeddings from a cross-lingual lexicon. The vocabulary in each language is projected onto a separate high-dimensional vector space, and these vector spaces are then compared using several different distance measures (i.e., correlation, cosine etc.) to measure their pairwise semantic similarities between these languages. A substantial degree of similarity is observed between the vector spaces learned from corpora of the European languages. Null hypothesis testing and bootstrap methods (by resampling without replacement) are utilized to verify the results.