{"title":"罗马尼亚语的词嵌入及其在同义词检测中的应用","authors":"M. Popescu, C. Rusu, L. Grama","doi":"10.1109/sped53181.2021.9587432","DOIUrl":null,"url":null,"abstract":"The aim of this paper is to present some results on word embeddings for the Romanian language, based on the word2vec method. More concretely, we generate word embeddings of different lengths, and using different preprocessing and training techniques. The embeddings are general purpose, and we use the Romanian language version of Wikipedia as corpus. We also evaluate the computational resources needed for the task. The embeddings are validated by performing some experiments on synonyms detection, using a new dataset created for this purpose. The code and the dataset are made publicly available. The results indicate that these types of embeddings can be used with the summarization approaches.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Word Embeddings for Romanian Language and Their Use for Synonyms Detection\",\"authors\":\"M. Popescu, C. Rusu, L. Grama\",\"doi\":\"10.1109/sped53181.2021.9587432\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The aim of this paper is to present some results on word embeddings for the Romanian language, based on the word2vec method. More concretely, we generate word embeddings of different lengths, and using different preprocessing and training techniques. The embeddings are general purpose, and we use the Romanian language version of Wikipedia as corpus. We also evaluate the computational resources needed for the task. The embeddings are validated by performing some experiments on synonyms detection, using a new dataset created for this purpose. The code and the dataset are made publicly available. The results indicate that these types of embeddings can be used with the summarization approaches.\",\"PeriodicalId\":193702,\"journal\":{\"name\":\"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/sped53181.2021.9587432\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/sped53181.2021.9587432","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Word Embeddings for Romanian Language and Their Use for Synonyms Detection
The aim of this paper is to present some results on word embeddings for the Romanian language, based on the word2vec method. More concretely, we generate word embeddings of different lengths, and using different preprocessing and training techniques. The embeddings are general purpose, and we use the Romanian language version of Wikipedia as corpus. We also evaluate the computational resources needed for the task. The embeddings are validated by performing some experiments on synonyms detection, using a new dataset created for this purpose. The code and the dataset are made publicly available. The results indicate that these types of embeddings can be used with the summarization approaches.