{"title":"一个新颖的土耳其语自然语言处理工具,用于词干提取,形态标记和动词否定","authors":"","doi":"10.34028/iajit/18/2/3","DOIUrl":null,"url":null,"abstract":"GovdeTurk is a tool for stemming, morphological labeling and verb negation for Turkish language. We designed comprehensive finite automata to represent Turkish grammar rules. Based on these automata, GovdeTurk finds the stem of the word by removing the inflectional suffixes in a longest match strategy. Levenshtein Distance is used to correct spelling errors that may occur during suffix removal. Morphological labeling identifies the functionality of a given token. Nine different dictionaries are constructed for each specific word type. These dictionaries are used in the stemming and morphological labeling. Verb negation module is developed for lexicon based sentiment analysis. GovdeTurk is tested on a dataset of one million words. The results are compared with Zemberek and Turkish Snowball Algorithm. While the closest competitor, Zemberek, in the stemming step has an accuracy of 80%, GovdeTurk gives 97.3% of accuracy. Morphological labeling accuracy of GovdeTurk is 93.6%. With outperforming results, our model becomes foremost among its competitors","PeriodicalId":161392,"journal":{"name":"The International Arab Journal of Information Technology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"GovdeTurk: A Novel Turkish Natural Language Processing Tool for Stemming, Morphological Labelling and Verb Negation\",\"authors\":\"\",\"doi\":\"10.34028/iajit/18/2/3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GovdeTurk is a tool for stemming, morphological labeling and verb negation for Turkish language. We designed comprehensive finite automata to represent Turkish grammar rules. Based on these automata, GovdeTurk finds the stem of the word by removing the inflectional suffixes in a longest match strategy. Levenshtein Distance is used to correct spelling errors that may occur during suffix removal. Morphological labeling identifies the functionality of a given token. Nine different dictionaries are constructed for each specific word type. These dictionaries are used in the stemming and morphological labeling. Verb negation module is developed for lexicon based sentiment analysis. GovdeTurk is tested on a dataset of one million words. The results are compared with Zemberek and Turkish Snowball Algorithm. While the closest competitor, Zemberek, in the stemming step has an accuracy of 80%, GovdeTurk gives 97.3% of accuracy. Morphological labeling accuracy of GovdeTurk is 93.6%. With outperforming results, our model becomes foremost among its competitors\",\"PeriodicalId\":161392,\"journal\":{\"name\":\"The International Arab Journal of Information Technology\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The International Arab Journal of Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34028/iajit/18/2/3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Arab Journal of Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34028/iajit/18/2/3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
GovdeTurk: A Novel Turkish Natural Language Processing Tool for Stemming, Morphological Labelling and Verb Negation
GovdeTurk is a tool for stemming, morphological labeling and verb negation for Turkish language. We designed comprehensive finite automata to represent Turkish grammar rules. Based on these automata, GovdeTurk finds the stem of the word by removing the inflectional suffixes in a longest match strategy. Levenshtein Distance is used to correct spelling errors that may occur during suffix removal. Morphological labeling identifies the functionality of a given token. Nine different dictionaries are constructed for each specific word type. These dictionaries are used in the stemming and morphological labeling. Verb negation module is developed for lexicon based sentiment analysis. GovdeTurk is tested on a dataset of one million words. The results are compared with Zemberek and Turkish Snowball Algorithm. While the closest competitor, Zemberek, in the stemming step has an accuracy of 80%, GovdeTurk gives 97.3% of accuracy. Morphological labeling accuracy of GovdeTurk is 93.6%. With outperforming results, our model becomes foremost among its competitors