{"title":"Semantic Similarity of Inverse Morpheme Words Based on Word Embedding","authors":"Jiaomei Zhou, Zhiying Liu","doi":"10.1142/s2717554521500065","DOIUrl":null,"url":null,"abstract":"Inverse morpheme words are compound words that have the same morphemes but are arranged in the opposite order. The majority of related works on the subject have focused on a narrow investigation of dictionary definitions, with few studies based on large-scale corpora. We used the People’s Daily corpus (1946–2017) to add and delete words from a base list and obtained a word list of 668 pairs of inverse morpheme words. Furthermore, the cosine similarity is computed by using word embedding based on the distributed representation, and the Pearson correlation coefficient between it and the manually annotated value is 0.907, indicating that this method can measure the semantic similarity of inverse morpheme words very close to human judgment. We also discovered that 76% of inverse morpheme words have a cosine similarity of 0.4 or higher and that word formation, part-of-speech, and frequency all have an impact on semantic similarity.","PeriodicalId":181294,"journal":{"name":"International Journal of Asian Language Processing","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Asian Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2717554521500065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Inverse morpheme words are compound words that have the same morphemes but are arranged in the opposite order. The majority of related works on the subject have focused on a narrow investigation of dictionary definitions, with few studies based on large-scale corpora. We used the People’s Daily corpus (1946–2017) to add and delete words from a base list and obtained a word list of 668 pairs of inverse morpheme words. Furthermore, the cosine similarity is computed by using word embedding based on the distributed representation, and the Pearson correlation coefficient between it and the manually annotated value is 0.907, indicating that this method can measure the semantic similarity of inverse morpheme words very close to human judgment. We also discovered that 76% of inverse morpheme words have a cosine similarity of 0.4 or higher and that word formation, part-of-speech, and frequency all have an impact on semantic similarity.