{"title":"Arabic Word stemming Based on Pattern Affixes Removal","authors":"Sari Awwad","doi":"10.1109/IACS.2019.8809169","DOIUrl":null,"url":null,"abstract":"Arabic word term stem became an essential part of any text processing algorithms and information retrieval. The big challenge is how to distinguish between affixes and original characters in Arabic term, where some characters become affixes in Arabic terms and become original in other Arabic terms. The goal of this research is to discover what extent depends on affix stripping to find Arabic term stem. The contribution consists of two parts, starting with removing all kinds of affixes from Arabic term, it has been done by constructing affixes hash tables. The second part is producing 24 possible stems for the same Arabic term by using 24 stripping orders.The experiments proved that there is at least one correct stem out of 24 possible stems. The conclusion is that the most efficient stripping orders are those that begin by removing prefixes followed by removing infixes, and then removing suffixes. The dataset that is used for testing consists of four different subject documents with 2000 Arabic words. The final results after using stripping orders has reached up to 86% of correctness which is the highest percentage comparing to other stripping orders.","PeriodicalId":225697,"journal":{"name":"2019 10th International Conference on Information and Communication Systems (ICICS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 10th International Conference on Information and Communication Systems (ICICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IACS.2019.8809169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Arabic word term stem became an essential part of any text processing algorithms and information retrieval. The big challenge is how to distinguish between affixes and original characters in Arabic term, where some characters become affixes in Arabic terms and become original in other Arabic terms. The goal of this research is to discover what extent depends on affix stripping to find Arabic term stem. The contribution consists of two parts, starting with removing all kinds of affixes from Arabic term, it has been done by constructing affixes hash tables. The second part is producing 24 possible stems for the same Arabic term by using 24 stripping orders.The experiments proved that there is at least one correct stem out of 24 possible stems. The conclusion is that the most efficient stripping orders are those that begin by removing prefixes followed by removing infixes, and then removing suffixes. The dataset that is used for testing consists of four different subject documents with 2000 Arabic words. The final results after using stripping orders has reached up to 86% of correctness which is the highest percentage comparing to other stripping orders.