Transmorph: a transformer based morphological disambiguator for Turkish

IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Turkish Journal of Electrical Engineering and Computer Sciences Pub Date : 2022-01-01 DOI:10.55730/1300-0632.3912
Hilal Özer, E. E. Korkmaz
{"title":"Transmorph: a transformer based morphological disambiguator for Turkish","authors":"Hilal Özer, E. E. Korkmaz","doi":"10.55730/1300-0632.3912","DOIUrl":null,"url":null,"abstract":": The agglutinative nature of the Turkish language has a complex morphological structure, and there are generally more than one parse for a given word. Before further processing, morphological disambiguation is required to determine the correct morphological analysis of a word. Morphological disambiguation is one of the first and crucial steps in natural language processing since its success determines later analyses. In our proposed morphological disambiguation method, we used a transformer-based sequence-to-sequence neural network architecture. Transformers are commonly used in various NLP tasks, and they produce state-of-the-art results in machine translation. However, to the best of our knowledge, transformer-based encoder-decoders have not been studied in morphological disambiguation. In this study, in addition to character level tokenization, three input subword representations are evaluated, which are unigram, bytepair, and wordpiece tokenization methods. We have achieved the best accuracy with character input representation which is 96.25%. Although the proposed model is developed for Turkish language, it is not language-dependent, so it can be applied to a larger set of languages.","PeriodicalId":49410,"journal":{"name":"Turkish Journal of Electrical Engineering and Computer Sciences","volume":"30 1","pages":"1897-1913"},"PeriodicalIF":1.2000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Turkish Journal of Electrical Engineering and Computer Sciences","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.55730/1300-0632.3912","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 1

Abstract

: The agglutinative nature of the Turkish language has a complex morphological structure, and there are generally more than one parse for a given word. Before further processing, morphological disambiguation is required to determine the correct morphological analysis of a word. Morphological disambiguation is one of the first and crucial steps in natural language processing since its success determines later analyses. In our proposed morphological disambiguation method, we used a transformer-based sequence-to-sequence neural network architecture. Transformers are commonly used in various NLP tasks, and they produce state-of-the-art results in machine translation. However, to the best of our knowledge, transformer-based encoder-decoders have not been studied in morphological disambiguation. In this study, in addition to character level tokenization, three input subword representations are evaluated, which are unigram, bytepair, and wordpiece tokenization methods. We have achieved the best accuracy with character input representation which is 96.25%. Although the proposed model is developed for Turkish language, it is not language-dependent, so it can be applied to a larger set of languages.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
变形:一个基于变形的土耳其语形态消歧器
土耳其语的黏着性质具有复杂的形态结构,并且通常对给定的单词有不止一种解析。在进一步处理之前,需要进行词形消歧,以确定单词的正确词形分析。形态消歧是自然语言处理的第一步和关键步骤之一,因为它的成功决定了以后的分析。在我们提出的形态消歧方法中,我们使用了基于变压器的序列到序列神经网络架构。变压器通常用于各种NLP任务,它们在机器翻译中产生最先进的结果。然而,据我们所知,基于变换的编码器-解码器尚未在形态学消歧中进行研究。在本研究中,除了字符级标记化之外,还评估了三种输入子词表示,即单字符、字节对和词块标记化方法。我们在字符输入表示方面取得了最好的准确率,达到96.25%。虽然建议的模型是为土耳其语开发的,但它不依赖于语言,因此它可以应用于更大的语言集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Turkish Journal of Electrical Engineering and Computer Sciences
Turkish Journal of Electrical Engineering and Computer Sciences COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
2.90
自引率
9.10%
发文量
95
审稿时长
6.9 months
期刊介绍: The Turkish Journal of Electrical Engineering & Computer Sciences is published electronically 6 times a year by the Scientific and Technological Research Council of Turkey (TÜBİTAK) Accepts English-language manuscripts in the areas of power and energy, environmental sustainability and energy efficiency, electronics, industry applications, control systems, information and systems, applied electromagnetics, communications, signal and image processing, tomographic image reconstruction, face recognition, biometrics, speech processing, video processing and analysis, object recognition, classification, feature extraction, parallel and distributed computing, cognitive systems, interaction, robotics, digital libraries and content, personalized healthcare, ICT for mobility, sensors, and artificial intelligence. Contribution is open to researchers of all nationalities.
期刊最新文献
A comparative study of YOLO models and a transformer-based YOLOv5 model for mass detection in mammograms Feature selection optimization with filtering and wrapper methods: two disease classification cases New modified carrier-based level-shifted PWM control for NPC rectifiers considered for implementation in EV fast chargers FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets A practical low-dimensional feature vector generation method based on wavelet transform for psychophysiological signals
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1