Intrinsic Evaluation of Word Embeddings for Turkish

Hayri Volkan Agun, Ozgur Yilmazel
{"title":"Intrinsic Evaluation of Word Embeddings for Turkish","authors":"Hayri Volkan Agun, Ozgur Yilmazel","doi":"10.1145/3440084.3441184","DOIUrl":null,"url":null,"abstract":"Word embeddings are evaluated through intrinsic and extrinsic tests. Similarity and analogy test are mainly preferred for intrinsic evaluation and natural language processing tasks such as named entity recognition and question answering are prefferred for extrinsic evaluation. Although there are various intrinsic evaluation datasets for English, the datasets for Turkish are very limited and measuring the degree of similarity and relatedness between words without specifying the type of semantic relation. In this paper, we propose an intrinsic evaluation dataset for evaluating different semantic relations other than a synonym, antonym, hypernym, and meronym as well as morphological relations of individual Turkish words. Moreover, we benchmark three publicly available word-embedding models on the proposed dataset and discuss agglutinative characteristics of the Turkish language for language modeling.","PeriodicalId":250100,"journal":{"name":"Proceedings of the 2020 4th International Symposium on Computer Science and Intelligent Control","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 4th International Symposium on Computer Science and Intelligent Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3440084.3441184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Word embeddings are evaluated through intrinsic and extrinsic tests. Similarity and analogy test are mainly preferred for intrinsic evaluation and natural language processing tasks such as named entity recognition and question answering are prefferred for extrinsic evaluation. Although there are various intrinsic evaluation datasets for English, the datasets for Turkish are very limited and measuring the degree of similarity and relatedness between words without specifying the type of semantic relation. In this paper, we propose an intrinsic evaluation dataset for evaluating different semantic relations other than a synonym, antonym, hypernym, and meronym as well as morphological relations of individual Turkish words. Moreover, we benchmark three publicly available word-embedding models on the proposed dataset and discuss agglutinative characteristics of the Turkish language for language modeling.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
土耳其语词嵌入的内在评价
词嵌入通过内在和外在测试进行评估。内在评价以相似性和类比测试为主,外在评价以命名实体识别和问答等自然语言处理任务为主。虽然英语有各种各样的内在评价数据集,但土耳其语的数据集非常有限,并且在没有指定语义关系类型的情况下测量单词之间的相似度和相关性。在本文中,我们提出了一个内在评价数据集,用于评价除同义词、反义词、上义和反义外的不同语义关系以及单个土耳其语单词的形态关系。此外,我们在提出的数据集上对三个公开可用的词嵌入模型进行基准测试,并讨论了用于语言建模的土耳其语的粘合特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
CHRONOROBOTICS A Secured Healthcare System Using Blockchain and Graph Theory Research of Cluster Feature Extraction and Evaluation System Construction for Mixed Teaching Data Heuristic Tentacle Algorithm for Local Path Planning Based on Obstacles Clustering Concept Chatbot as Islamic Finance Expert (CaIFE): When Finance Meets Artificial Intelligence
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1